SlideShare a Scribd company logo
House Price
Prediction
by
B Rama Krishna
I19MA048
OUTLINE
Introduction
Problem Statement
Project Specification
Dataset
Pipeline
Data Cleaning
Feature Engineering
Outlier Detection
Outlier Removal
One Hot Encoding
Model Creation
K-fold cross validation
Grid search cross validation
Result/Output
INTRODUCTION
⚫ In this project, we have built a machine learning model to predict t
he house prices of an Indian city( Bengaluru ).
⚫ This project will very helpful for the real estate market.
⚫ Our model can be used by both house sellers and house buyers.
⚫ Multiple Linear Regression algorithm is
used to create a model with a great accuracy score.
PROBLEM STATEMENT
⚫ Prices of real estate properties are sophisticatedly linked with
our economy.
⚫ Despite this, we do not have accurate measures of house prices
based on the vast amount of data available.
⚫ Proper and justified prices of properties can bring in
a lot of transparency and trust back to the real
estate industry, which is very important for most consumers espec
ially in India.
PROJECT SPECIFICATION
⚫ The goal of this project is to predict house prices
in Bengaluru city based on
some features such as location, size/area,
number of bedrooms, and number of bathrooms.
⚫ Bengaluru house price
dataset is used to create the model.
⚫ We are using
Machine Learning Algorithm to create a predictiv
e model.
⚫ Multiple Linear Regression algorithm is
used totrain and test the model in our project.
Dataset comes from kaggle.com.
Link of the dataset – “https://www.kaggle.com/amitabhajoy/bengaluru-house-
price-data ”
There are13320 number of observations in our dataset.
There are a total of 9 columns/attributes in our dataset.
The all
9 columns are area_type, availability, location, size, total_sqft, bath, society,
balcony, and price.
We have used a total of 5 features to train our machine learning model.
PIPLINE DAT
A CLEANING
FEATURE ENGINEERING
MODEL CREATION
OUTLIER REMOVAL
OUTLIER DETECTION
ONE HOT ENCODING
DATA CLEANING
⚫ The main aim of Data Cleaning is to identify and remove errors &
• duplicate data, in order to create a reliable dataset.
⚫ The process of data cleaning is done by using a very famous library pandas.
⚫ Initially, those columns/features are dropped from our dataset who are not important in decid
ing the final price.
⚫ The rows having a null value in any columns are dropped from our dataset.
⚫ The
columns having characters as well as integer values are converted into integer values only.
EXAMPLE:
Many more data cleaning techniques are used to improve the quality of
our dataset.
bedrooms
2 bhk
4 bedrooms
Converted as
bedrooms
2
4
FEATURE ENGINEERING
⚫ Feature engineering is the process of using domain knowledge
to extract features from raw data via data mining techniques.
These features can be used to improve the performance of
machine learning algorithms. Feature engineering can be
considered as applied machine learning itself.
⚫ Dimensionality reduction techniques are used in our dataset to
reduce those rows who are not very much important to decide
the house price.
OUTLIER DETECTION
⚫ In simple words, an outlier is
an observation that diverges from an overall pattern on
a sample.
⚫ There are many types of outlier detection techniques such as Z-
Score or Extreme Value Analysis, Probabilistic and Statistical
Modeling, Information Theory Models, Standard Deviation etc.
⚫ We have used simple domain knowledge of real estate market to de
tect the outliers in our dataset.
OUTLIER REMOVAL
⚫ After detecting the outlier, correct that errors if possible
and if you can not fix it, then remove that observation.
⚫ In our dataset, we observed variations in the
relation between values of some attributes.
⚫ So that these type of rows are dropped from the dataset.
⚫ Scatter plots are used to detect some more outliers and they are als
o removed from our dataset.
ONE HOT ENCODING
⚫ This technique is used to convert the categorical variables into numeric values.
⚫ Our dataset contains a categorical variable which is “location”.
⚫ We have used one hot encoding method to convert them as numeric values.
As we can see the location name 1st Block Jayanagar having
the value 1 and the rest of the locations are treated as 0.
MODEL CREATION
⚫The process of modeling means training a machine learning algorithm to
predict the labels from the features.
⚫We have used Linear Regression algorithm for trainig and testing of the model.
⚫The accuracy rate of our model is 87% which is pretty good.
K-FOLD CROSS VALIDATION
⚫ Cross-validation is a statistical method used to estimate the skill of
machine learning models.
⚫ It is commonly used in applied machine learning to compare and select a model for a given predictive
modeling problem because it is easy to understand, easy to implement, and results in skill estimates
that generally have a lower bias than other methods.
⚫ After applying k-fold cross validation method into our final dataset, we find our accuracy rate is more
than 80% all the time.
GRID SEARCH CROSS VALIDATION
⚫ This is a technique which is used to find the best algorithm for
modeling and give the best parameters as well.
• We applied grid search cross validation method on our dataset
with Linear Regression, Lasso Regression, and Decision Tree
algorithms.
• We find the Linear Regression algorithm is giving the best
accuracy score as more than 80%.
RESULT/OUTPUT
⚫We created a function to predict the house price.
⚫Our function be like “ predict_price(location, sqft, bath,bhk) ”
⚫When we pass the values into our function, it will predict house price for us.
Another way to check the correlation between attributes is to use the pandas
scatter_matrix() function, which plots each numeric attribute against every other numeric
attribute:
THANK YOU

More Related Content

What's hot

IRJET- Future Stock Price Prediction using LSTM Machine Learning Algorithm
IRJET-  	  Future Stock Price Prediction using LSTM Machine Learning AlgorithmIRJET-  	  Future Stock Price Prediction using LSTM Machine Learning Algorithm
IRJET- Future Stock Price Prediction using LSTM Machine Learning Algorithm
IRJET Journal
 
Animation in Computer Graphics
Animation in Computer GraphicsAnimation in Computer Graphics
Animation in Computer Graphics
RinkuNahar
 
Two dimensional geometric transformations
Two dimensional geometric transformationsTwo dimensional geometric transformations
Two dimensional geometric transformations
Mohammad Sadiq
 
Computer Vision - Image Formation.pdf
Computer Vision - Image Formation.pdfComputer Vision - Image Formation.pdf
Computer Vision - Image Formation.pdf
AmmarahMajeed
 
Random forest and decision tree
Random forest and decision treeRandom forest and decision tree
Random forest and decision tree
AAKANKSHA JAIN
 
STOCK MARKET PREDICTION
STOCK MARKET PREDICTIONSTOCK MARKET PREDICTION
STOCK MARKET PREDICTION
Shivank Chaudhary
 
Problem solving agents
Problem solving agentsProblem solving agents
Problem solving agents
Megha Sharma
 
Confusion matrix, accuracy, precision, recall, f score
Confusion matrix, accuracy, precision, recall, f scoreConfusion matrix, accuracy, precision, recall, f score
Confusion matrix, accuracy, precision, recall, f score
Saurabh Singh
 
Image stitching
Image stitchingImage stitching
Image stitching
Manohar Mukku
 
MATLAB Based Vehicle Number Plate Identification System using OCR
MATLAB Based Vehicle Number Plate Identification System using OCRMATLAB Based Vehicle Number Plate Identification System using OCR
MATLAB Based Vehicle Number Plate Identification System using OCR
Ghanshyam Dusane
 
Data Science - Part VII - Cluster Analysis
Data Science - Part VII -  Cluster AnalysisData Science - Part VII -  Cluster Analysis
Data Science - Part VII - Cluster Analysis
Derek Kane
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
Kuppusamy P
 
Image processing
Image processingImage processing
Image processing
MuhammadFahadSaleem11
 
Multiclass classification of imbalanced data
Multiclass classification of imbalanced dataMulticlass classification of imbalanced data
Multiclass classification of imbalanced data
SaurabhWani6
 
Block Matching Project
Block Matching ProjectBlock Matching Project
Block Matching Project
dswazalwar
 
Hog
HogHog
Visual CryptoGraphy
Visual CryptoGraphyVisual CryptoGraphy
Visual CryptoGraphy
pallavikhandekar212
 
Object tracking
Object trackingObject tracking
Object tracking
ahmadamin636
 
Stock Market Prediction using Machine Learning
Stock Market Prediction using Machine LearningStock Market Prediction using Machine Learning
Stock Market Prediction using Machine Learning
ijtsrd
 
State space search
State space searchState space search
State space search
chauhankapil
 

What's hot (20)

IRJET- Future Stock Price Prediction using LSTM Machine Learning Algorithm
IRJET-  	  Future Stock Price Prediction using LSTM Machine Learning AlgorithmIRJET-  	  Future Stock Price Prediction using LSTM Machine Learning Algorithm
IRJET- Future Stock Price Prediction using LSTM Machine Learning Algorithm
 
Animation in Computer Graphics
Animation in Computer GraphicsAnimation in Computer Graphics
Animation in Computer Graphics
 
Two dimensional geometric transformations
Two dimensional geometric transformationsTwo dimensional geometric transformations
Two dimensional geometric transformations
 
Computer Vision - Image Formation.pdf
Computer Vision - Image Formation.pdfComputer Vision - Image Formation.pdf
Computer Vision - Image Formation.pdf
 
Random forest and decision tree
Random forest and decision treeRandom forest and decision tree
Random forest and decision tree
 
STOCK MARKET PREDICTION
STOCK MARKET PREDICTIONSTOCK MARKET PREDICTION
STOCK MARKET PREDICTION
 
Problem solving agents
Problem solving agentsProblem solving agents
Problem solving agents
 
Confusion matrix, accuracy, precision, recall, f score
Confusion matrix, accuracy, precision, recall, f scoreConfusion matrix, accuracy, precision, recall, f score
Confusion matrix, accuracy, precision, recall, f score
 
Image stitching
Image stitchingImage stitching
Image stitching
 
MATLAB Based Vehicle Number Plate Identification System using OCR
MATLAB Based Vehicle Number Plate Identification System using OCRMATLAB Based Vehicle Number Plate Identification System using OCR
MATLAB Based Vehicle Number Plate Identification System using OCR
 
Data Science - Part VII - Cluster Analysis
Data Science - Part VII -  Cluster AnalysisData Science - Part VII -  Cluster Analysis
Data Science - Part VII - Cluster Analysis
 
Reinforcement learning, Q-Learning
Reinforcement learning, Q-LearningReinforcement learning, Q-Learning
Reinforcement learning, Q-Learning
 
Image processing
Image processingImage processing
Image processing
 
Multiclass classification of imbalanced data
Multiclass classification of imbalanced dataMulticlass classification of imbalanced data
Multiclass classification of imbalanced data
 
Block Matching Project
Block Matching ProjectBlock Matching Project
Block Matching Project
 
Hog
HogHog
Hog
 
Visual CryptoGraphy
Visual CryptoGraphyVisual CryptoGraphy
Visual CryptoGraphy
 
Object tracking
Object trackingObject tracking
Object tracking
 
Stock Market Prediction using Machine Learning
Stock Market Prediction using Machine LearningStock Market Prediction using Machine Learning
Stock Market Prediction using Machine Learning
 
State space search
State space searchState space search
State space search
 

Similar to Bathi%20Ram%20PPT.pptx

housepriceprediction-ml.pptx
housepriceprediction-ml.pptxhousepriceprediction-ml.pptx
housepriceprediction-ml.pptx
tommychauhan
 
housepriceprediction.pptx
housepriceprediction.pptxhousepriceprediction.pptx
housepriceprediction.pptx
rishumaurya10
 
House price prediction
House price predictionHouse price prediction
House price prediction
AdityaKumar1505
 
House_Price_Prediction using python and ML
House_Price_Prediction using python and MLHouse_Price_Prediction using python and ML
House_Price_Prediction using python and ML
PurviGupta42
 
Unveiling the Market: Predicting House Prices with Data Science
Unveiling the Market: Predicting House Prices with Data ScienceUnveiling the Market: Predicting House Prices with Data Science
Unveiling the Market: Predicting House Prices with Data Science
Boston Institute of Analytics
 
Predicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning ApproachPredicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning Approach
Boston Institute of Analytics
 
Final Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal...
Final Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal...Final Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal...
Final Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal...
Chaudhry Hussain
 
Predicting house price
Predicting house pricePredicting house price
Predicting house price
Divya Tiwari
 
laptop price prediction presentation
laptop price prediction presentationlaptop price prediction presentation
laptop price prediction presentation
NeerajNishad4
 
IRJET - House Price Predictor using ML through Artificial Neural Network
IRJET - House Price Predictor using ML through Artificial Neural NetworkIRJET - House Price Predictor using ML through Artificial Neural Network
IRJET - House Price Predictor using ML through Artificial Neural Network
IRJET Journal
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine Learning
Knoldus Inc.
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
Rising Media, Inc.
 
SHAHBAZ_TECHNICAL_SEMINAR.docx
SHAHBAZ_TECHNICAL_SEMINAR.docxSHAHBAZ_TECHNICAL_SEMINAR.docx
SHAHBAZ_TECHNICAL_SEMINAR.docx
ShahbazKhan77289
 
House Price Prediction for Ai & ml project .pptx
House Price Prediction for Ai & ml project .pptxHouse Price Prediction for Ai & ml project .pptx
House Price Prediction for Ai & ml project .pptx
aditichauhan2019404
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
AnushaSharma81
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
PATHALAMRAJESH
 
House Price Prediction Using Machine Learning
House Price Prediction Using Machine LearningHouse Price Prediction Using Machine Learning
House Price Prediction Using Machine Learning
IRJET Journal
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
Alon Bochman, CFA
 
Internship PPT.ppsx
Internship PPT.ppsxInternship PPT.ppsx
Internship PPT.ppsx
Syeda Nasiha
 
Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
vishwajeetparmar1
 

Similar to Bathi%20Ram%20PPT.pptx (20)

housepriceprediction-ml.pptx
housepriceprediction-ml.pptxhousepriceprediction-ml.pptx
housepriceprediction-ml.pptx
 
housepriceprediction.pptx
housepriceprediction.pptxhousepriceprediction.pptx
housepriceprediction.pptx
 
House price prediction
House price predictionHouse price prediction
House price prediction
 
House_Price_Prediction using python and ML
House_Price_Prediction using python and MLHouse_Price_Prediction using python and ML
House_Price_Prediction using python and ML
 
Unveiling the Market: Predicting House Prices with Data Science
Unveiling the Market: Predicting House Prices with Data ScienceUnveiling the Market: Predicting House Prices with Data Science
Unveiling the Market: Predicting House Prices with Data Science
 
Predicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning ApproachPredicting House Prices: A Machine Learning Approach
Predicting House Prices: A Machine Learning Approach
 
Final Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal...
Final Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal...Final Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal...
Final Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal Defence.pptxFinal...
 
Predicting house price
Predicting house pricePredicting house price
Predicting house price
 
laptop price prediction presentation
laptop price prediction presentationlaptop price prediction presentation
laptop price prediction presentation
 
IRJET - House Price Predictor using ML through Artificial Neural Network
IRJET - House Price Predictor using ML through Artificial Neural NetworkIRJET - House Price Predictor using ML through Artificial Neural Network
IRJET - House Price Predictor using ML through Artificial Neural Network
 
Feature Engineering in Machine Learning
Feature Engineering in Machine LearningFeature Engineering in Machine Learning
Feature Engineering in Machine Learning
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
SHAHBAZ_TECHNICAL_SEMINAR.docx
SHAHBAZ_TECHNICAL_SEMINAR.docxSHAHBAZ_TECHNICAL_SEMINAR.docx
SHAHBAZ_TECHNICAL_SEMINAR.docx
 
House Price Prediction for Ai & ml project .pptx
House Price Prediction for Ai & ml project .pptxHouse Price Prediction for Ai & ml project .pptx
House Price Prediction for Ai & ml project .pptx
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
 
House Price Prediction Using Machine Learning
House Price Prediction Using Machine LearningHouse Price Prediction Using Machine Learning
House Price Prediction Using Machine Learning
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
 
Internship PPT.ppsx
Internship PPT.ppsxInternship PPT.ppsx
Internship PPT.ppsx
 
Internship Presentation.pdf
Internship Presentation.pdfInternship Presentation.pdf
Internship Presentation.pdf
 

Recently uploaded

End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
g4dpvqap0
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Fernanda Palhano
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
74nqk8xf
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
roli9797
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
AlessioFois2
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 

Recently uploaded (20)

End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
一比一原版(Glasgow毕业证书)格拉斯哥大学毕业证如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdfUdemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
Udemy_2024_Global_Learning_Skills_Trends_Report (1).pdf
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
一比一原版(牛布毕业证书)牛津布鲁克斯大学毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
Analysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performanceAnalysis insight about a Flyball dog competition team's performance
Analysis insight about a Flyball dog competition team's performance
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
A presentation that explain the Power BI Licensing
A presentation that explain the Power BI LicensingA presentation that explain the Power BI Licensing
A presentation that explain the Power BI Licensing
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 

Bathi%20Ram%20PPT.pptx

  • 2. OUTLINE Introduction Problem Statement Project Specification Dataset Pipeline Data Cleaning Feature Engineering Outlier Detection Outlier Removal One Hot Encoding Model Creation K-fold cross validation Grid search cross validation Result/Output
  • 3. INTRODUCTION ⚫ In this project, we have built a machine learning model to predict t he house prices of an Indian city( Bengaluru ). ⚫ This project will very helpful for the real estate market. ⚫ Our model can be used by both house sellers and house buyers. ⚫ Multiple Linear Regression algorithm is used to create a model with a great accuracy score.
  • 4. PROBLEM STATEMENT ⚫ Prices of real estate properties are sophisticatedly linked with our economy. ⚫ Despite this, we do not have accurate measures of house prices based on the vast amount of data available. ⚫ Proper and justified prices of properties can bring in a lot of transparency and trust back to the real estate industry, which is very important for most consumers espec ially in India.
  • 5. PROJECT SPECIFICATION ⚫ The goal of this project is to predict house prices in Bengaluru city based on some features such as location, size/area, number of bedrooms, and number of bathrooms. ⚫ Bengaluru house price dataset is used to create the model. ⚫ We are using Machine Learning Algorithm to create a predictiv e model. ⚫ Multiple Linear Regression algorithm is used totrain and test the model in our project.
  • 6. Dataset comes from kaggle.com. Link of the dataset – “https://www.kaggle.com/amitabhajoy/bengaluru-house- price-data ” There are13320 number of observations in our dataset. There are a total of 9 columns/attributes in our dataset. The all 9 columns are area_type, availability, location, size, total_sqft, bath, society, balcony, and price. We have used a total of 5 features to train our machine learning model.
  • 7. PIPLINE DAT A CLEANING FEATURE ENGINEERING MODEL CREATION OUTLIER REMOVAL OUTLIER DETECTION ONE HOT ENCODING
  • 8. DATA CLEANING ⚫ The main aim of Data Cleaning is to identify and remove errors & • duplicate data, in order to create a reliable dataset. ⚫ The process of data cleaning is done by using a very famous library pandas. ⚫ Initially, those columns/features are dropped from our dataset who are not important in decid ing the final price. ⚫ The rows having a null value in any columns are dropped from our dataset. ⚫ The columns having characters as well as integer values are converted into integer values only. EXAMPLE: Many more data cleaning techniques are used to improve the quality of our dataset. bedrooms 2 bhk 4 bedrooms Converted as bedrooms 2 4
  • 9. FEATURE ENGINEERING ⚫ Feature engineering is the process of using domain knowledge to extract features from raw data via data mining techniques. These features can be used to improve the performance of machine learning algorithms. Feature engineering can be considered as applied machine learning itself. ⚫ Dimensionality reduction techniques are used in our dataset to reduce those rows who are not very much important to decide the house price.
  • 10. OUTLIER DETECTION ⚫ In simple words, an outlier is an observation that diverges from an overall pattern on a sample. ⚫ There are many types of outlier detection techniques such as Z- Score or Extreme Value Analysis, Probabilistic and Statistical Modeling, Information Theory Models, Standard Deviation etc. ⚫ We have used simple domain knowledge of real estate market to de tect the outliers in our dataset.
  • 11. OUTLIER REMOVAL ⚫ After detecting the outlier, correct that errors if possible and if you can not fix it, then remove that observation. ⚫ In our dataset, we observed variations in the relation between values of some attributes. ⚫ So that these type of rows are dropped from the dataset. ⚫ Scatter plots are used to detect some more outliers and they are als o removed from our dataset.
  • 12. ONE HOT ENCODING ⚫ This technique is used to convert the categorical variables into numeric values. ⚫ Our dataset contains a categorical variable which is “location”. ⚫ We have used one hot encoding method to convert them as numeric values. As we can see the location name 1st Block Jayanagar having the value 1 and the rest of the locations are treated as 0.
  • 13. MODEL CREATION ⚫The process of modeling means training a machine learning algorithm to predict the labels from the features. ⚫We have used Linear Regression algorithm for trainig and testing of the model. ⚫The accuracy rate of our model is 87% which is pretty good.
  • 14. K-FOLD CROSS VALIDATION ⚫ Cross-validation is a statistical method used to estimate the skill of machine learning models. ⚫ It is commonly used in applied machine learning to compare and select a model for a given predictive modeling problem because it is easy to understand, easy to implement, and results in skill estimates that generally have a lower bias than other methods. ⚫ After applying k-fold cross validation method into our final dataset, we find our accuracy rate is more than 80% all the time.
  • 15. GRID SEARCH CROSS VALIDATION ⚫ This is a technique which is used to find the best algorithm for modeling and give the best parameters as well. • We applied grid search cross validation method on our dataset with Linear Regression, Lasso Regression, and Decision Tree algorithms. • We find the Linear Regression algorithm is giving the best accuracy score as more than 80%.
  • 16. RESULT/OUTPUT ⚫We created a function to predict the house price. ⚫Our function be like “ predict_price(location, sqft, bath,bhk) ” ⚫When we pass the values into our function, it will predict house price for us.
  • 17.
  • 18. Another way to check the correlation between attributes is to use the pandas scatter_matrix() function, which plots each numeric attribute against every other numeric attribute: