2. PROBLEM
DEFINITION
• Predicting the House Prices in Tehran City
located in Iran using the Housing Data.
• The prediction of house price is impacted by
various factors such as size, total area in square
feet, location, total number of bedrooms and
number of bathrooms.
3. DATA
EXPLORATION
• The dataset has 3479 records and 8 variables.
• There are 4 categorical variables such as
parking, warehouse, address and elevator and 4
continuous variables such as area, room, price
in Tomar and price in USD
6. IMPLEMENTATION
STEPS
1. Importing Data
2. Data Cleaning and data preprocessing
3. Feature Engineering
4. Removal of Outliers
5. Model Building
6. Testing the regression model
7. Prediction and accuracy of the model
7. 1) DATA CLEANING STEPS
1) Dropping
Unnecessary
Columns
2) Filling missing
values in columns
3) Transforming the
categorical variables
into numerical by
encoding
4) Creating a value
for address column
to store the location
of address
5) Cleaning Area in
Sq_Ft column
8. 2) FEATURE ENGINEERING STEPS
The Boolean data for three
columns such as parking,
elevator and parking was
changed to integer to
facilitate model building
The variable address was
changed to numeric since
location plays an effective
role in the analysis
Add value for address and
room per area
9. 3) OUTLIER DETECTION AND REMOVAL
Removal of Houses
where area above 1000
square feet for same
location
Removing houses with
missing values on
address
Data visualization
Performing One Hot
Encoding
11. 4) MODEL BUILDING
Use Measure accuracy and prediction
Build Build the model.
Split Split the data into Training and Testing.
Split Split the data into columns which are dependent and independent.
12. 5) TESTING REGRESSION MODELS AND PREDICTION OF HOUSE PRICE
USE REGRESSION TO
TEST THE DATA.
PREDICT THE PRICE.
15. A) LINEAR REGRESSION
• Linear regression attempts to model the relationship between two variables by fitting a linear
equation to observed data. One variable is considered to be an explanatory variable, and the
other is considered to be a dependent variable.
• The most common method for fitting a regression line is the method of least-squares. This
method calculates the best-fitting line for the observed data by minimizing the sum of the
squares of the vertical deviations from each data point to the line
• A linear regression line has an equation of the form Y = a + bX, where X is the explanatory
variable and Y is the dependent variable. The slope of the line is b, and a is the intercept (the
value of y when x = 0).
18. REGRESSION
• The price is positively related to the total number of rooms, value of the address and total
area in square feet.
• The house price is negatively related to the value of parking
• The variance and R squared value is 0.69 which means this model is 70% accurate
in predicting the relationship
There are totally 3479 records in the dataset. There are 8 variables such as area, room, parking, warehouse, elevator, address, price in Toman and price USD.
The mean price in USD is 1.7K USD and mean price in Toman is 5.35 K. The minimum price USD is 1.20 and maximum price USD is 3.08 USD. The mean total number of rooms in the house is 2.07. The maximum number of rooms in the house is 5.
The price of the house increases with the number of rooms, total area, room per area,
c
According to the checks made in the given data As we know, the valuation price of a house always has a direct relationship with a series of features.I tried to identify these features and I have considered a multiple linear regression as a model for evaluation. According to the knowledge I gained from the data, I have chosen the data that had an impact on the price increase.My final feature is The_value_of_each_address, Room, Parking, Area
House price prediction can raise a number of professional, ethical, and legal issues. Here are some potential considerations:
Accuracy: House price predictions need to be accurate in order to be useful to potential buyers and sellers. Any professional involved in providing these predictions must ensure they are using the most up-to-date data and reliable models to make their predictions.
Transparency: Professionals providing house price predictions must be transparent about their methodologies, assumptions, and limitations. This is important to ensure that clients have a clear understanding of the predictions they are receiving and can make informed decisions based on that information.
Fairness: House price prediction must be conducted in a fair and unbiased manner, without discrimination based on factors such as race, ethnicity, gender, or socioeconomic status.
Privacy: When using personal data to make house price predictions, professionals must ensure they are complying with data privacy laws and protecting the privacy of individuals whose data is being used.