2. PROBLEM
STATEMEN
T
From the given dataset, we have to
build a model to predict house price
based on different features. Since
house price is a continues variable
so this is a regression problem and
we will implement Linear
Regression to predict the Target
variable (House Price).
3. DATA DESCRIPTION
We have 81 columns.
Our Target Variable is SalePrice.
Id is just an index that we can drop as it’s not required in prediction.
There are many missing values in this dataset
5. TARGET VARIABLE
• From these plots we can observe
that Target Variable is right skewed.
So by log transformation this can be
converted into normal distribution
6. NORMAL
DISTRIBUTION
TARGET VARIABLE
• After log
transformation we get
a normal distribution
of Target variable.
This fulfills one of the
assumptions of
Linear Regression
model.
• Skew is
0.1213350622052040
6
12. BUILDING A
LINEAR
REGRESSIO
N MODEL
• Using MinMax scaler for
scaling the data
• Split the given dataset into
train & test dataset in 80:20
ratio
• Target variable is normally
distributed
16. CONCLUSION
• People pay more for better
quality(OverallQual)
• People would pay for the more living
area(GrLivArea).
• It seems that house prices decrease with age,
but we need to be surer(YrSold)
• On this analysis, we took some of the
exploratory variables like
OverallQual,GrLivArea,GarageCars,GarageAr
ea,TotalBsmtSF,1stFlrSF .
• These have linear relationship with target
variable.