BCSE399J (Summer Industrial Internship)/
CBS1902/CSE1902/ CSI3903 (Industrial Internship)
Type of Intern: Industry
Title of Intern: Machine Learning
Internship completed at :
<i-RESEAT Conference> ,
<Best Western Nada Don Mueang Airport Hotel,
Bangkok, Thailand> , i-RESEAT , ireseatinter@gmaejo.mju.ac.th
Duration : 30 Days : < 21/11/2023 > to < 20/12/2024>
Date of Presentation: 04/05/2024
by
Aryan Rajesh
20BCE0718
School of Computer Science and Engineering
Internship Review - Presentation
1
Overview (what was firstly discovered and
why is it a breakthrough)
Using housing data from kaggle to build prediction models. The
data often includes attributes like square footage, location, number
of bedrooms etc. Location data requires special preprocessing as it
has an outsized impact on prices. Techniques like one-hot encoding
for neighborhoods are common. Identifying and removing outliers
also very important. Predictive accuracy in the 70-80% range on
unseen test data is considered quite good. Model interpretability
also important to understand which factors are most influential.
Sometimes insights are extracted - e.g. ranking locations by average
price per square foot. These add business value over just
predictions.
Overview (what was firstly discovered and
why is it a breakthrough)
This study addresses house price prediction in Bengaluru using linear
and multiple regression techniques. Utilizing a dataset of 1298 unique
localities, the research focuses on forecasting land prices in the
Bengaluru Metropolitan Area (BMA) in Karnataka, India. Beyond the
House Price Index (HPI), factors such as area type, availability, location,
society, and apartment size are considered. The goal is to predict the
price per square foot for apartments. In metropolitan cities like
Bengaluru, determining accurate sales prices remains challenging,
making predictive modeling crucial for real estate decision-making. The
models aim to capture the complex interplay of these factors in
influencing individual house prices in the dynamic real estate market of
Bengaluru.
Introduction
Understanding House Price Prediction
Introduction:
• Houses are essential for shelter and livelihood, influencing economic, financial, and political structures.
• Fluctuations in house prices pose challenges for stakeholders, including buyers and investors.
Importance of Data:
• Real estate data analysis aids in predicting market variations and mitigating future losses.
• Accurate prediction models are crucial for real estate businesses to make informed decisions.
Prediction Models:
• Support Vector Regression, Artificial Neural Network, and Bayesian Classifier are commonly used for house price
prediction.
• These models help stakeholders determine property valuations and make budget-based decisions.
Factors Influencing House Prices
Key Considerations for Home Buyers:
• Location, property size, proximity to amenities, and noise pollution are critical factors.
• Factors like air quality and noise pollution significantly impact property prices.
Market Trends:
• Bengaluru, as a real estate hotspot, has seen shifts in demand and price due to factors like COVID-19
and regulatory changes.
• Trust issues with property developers have affected sales and prices in cities like Bengaluru.
Machine Learning Approach:
• Bayesian Classifier, a supervised learning technique, is utilized for predictive analysis based on various
factors.
House Price Prediction Model Overview
Utilization of Technology:
• Leveraging data from trusted sources and employing machine learning algorithms for prediction.
• Supervised learning techniques like Bayesian Classifier are used for accurate predictions.
Application Across Industries:
• Predictive models find applications in economics, banking, healthcare, e-commerce, and more.
• Algorithms like KNN, Decision Tree, and Regression techniques are employed based on data characteristics
and requirements.
Ensuring Integrity:
• The model aims to provide the best predictions based on gathered data, maintaining system integrity and
user trust.
Aim/Objective
The primary aim of the project was to develop an accurate and reliable
model for predicting house prices in Bengaluru, India. The specific
objectives were:
• To explore and analyze the factors that influence house prices in
Bengaluru.
• To preprocess and clean the dataset to prepare it for modeling.
• To implement and evaluate various machine learning algorithms for
house price prediction.
• To identify the best-performing model and deploy it for practical usage.
Motivation
• The motivation behind this project was driven by the growing
demand for housing in metropolitan cities like Bengaluru and the
need for reliable tools to assist home buyers, investors, and real
estate developers in making informed decisions.
• Accurate house price prediction can help stakeholders determine
fair property valuations, identify overpriced or underpriced
properties, and make sound investment choices.
About the Industry
The 5th International Conference on Renewable Energy, Sustainable Environmental and
Agricultural and Artificial Intelligence Technologies (i-RESEAT-2023)"is a Hybrid Mode
Conference being organized by Thammasat University (Pathumtani City, Thailand) and co-
organized with Maejo University (Chiang Mai City, Thailand), Kaohsiung Medical University
(Kaohsiung City, Taiwan), University of Stavanger (Stavanger, Norway), and other supporting
partner universities /Institutions around the world. This conference aims to be the premier
forum for presenting new breakthroughs and research results in the theoretical, experimental,
and practical domains of Energy, Environment, and Agriculture Innovations, and Technologies.
It also serves as an excellent international forum for researchers, practitioners, industries and
educators to present and discuss the most recent innovations, trends, and concerns, as well
as practical challenges encountered and solutions implementation. The conference will bring
together world-renowned researchers, engineers, and scientists from around the world in this
field of Interest. The main theme of the i-RESEAT-2023 conferences is “Go-Green, Go-Eco, Go-
Smart Agri-Tech, and Go-BCG”.
Certificate
Skills Acquired during Industrial Internship Period
During the industrial internship, I have gained valuable skills in various domains, which can be categorized into
the following sections:
Data Preprocessing and Exploration using Pandas
Handling Missing Data
• Identifying and addressing missing values in the dataset
• Employing techniques like imputation or removal of missing data
Dealing with Outliers
• Detecting and handling outliers in the data
• Using appropriate methods like winsorization or removal of extreme values
Data Transformation
• Converting data into suitable formats for modeling
• Techniques like normalization, scaling, and encoding categorical variables
Exploratory Data Analysis (EDA)
• Visualizing data using various plots (scatter plots, bar plots, histograms)
• Analyzing patterns, trends, and relationships between variables
Feature Engineering and Selection
Relevant Feature Identification
• Determining the most influential features for house price prediction
• Employing techniques like correlation analysis and feature importance
Feature Creation
• Generating new features from existing ones
• Combining or transforming features to capture additional information
Feature Scaling and Encoding
• Scaling numerical features for improved model performance
• Encoding categorical features for use in machine learning algorithms
Machine Learning Modeling
Supervised Learning Algorithms
• Linear Regression
• Decision Trees
• Lasso Regression
Model Evaluation Metrics
• Mean Squared Error (MSE)
• R-squared
• Root Mean Squared Error (RMSE)
Hyperparameter Tuning
• Techniques like GridSearchCV for optimizing model parameters
Web Development and Deployment
Flask Framework
• Building a web application using the Python Flask framework
• Integrating the trained machine learning model for user interaction
HTML, CSS and Web Design
• Creating user-friendly interfaces for input and output display
• Enhancing the website's visual appeal and usability
New Technologies/Frameworks/Real-time
Problems/Analysis-based Knowledge Acquired
Introduction to Machine Learning Frameworks
Scikit-learn Library
• Utilizing the powerful Scikit-learn library in Python
• Implementing various machine learning algorithms and preprocessing
techniques
TensorFlow or PyTorch (if applicable)
• Exposure to deep learning frameworks like TensorFlow or PyTorch
• Understanding the potential of deep learning for complex problems
Real-world Problem: House Price Prediction
Understanding the Importance
• Recognizing the significance of accurate house price prediction
• Implications for stakeholders (buyers, sellers, investors, developers)
Influencing Factors
• Identifying key factors that impact house prices
• Location, area, number of rooms, amenities, neighborhood characteristics
Data Acquisition and Preprocessing
• Sourcing relevant datasets for house price prediction
• Cleaning, transforming, and preparing data for modeling
Exploratory Data Analysis and Visualization
Univariate Analysis
• Analyzing the distribution and characteristics of individual features
• Identifying outliers, skewness, and central tendencies
Bivariate Analysis
• Exploring relationships between pairs of features
• Scatter plots, correlation matrices, and other visualizations
Multivariate Analysis
• Investigating interactions among multiple features
• Techniques like principal component analysis (PCA) or t-SNE
Model Evaluation and Interpretation
Evaluation Metrics
• Understanding and interpreting evaluation metrics and using K-Fold Cross Validation valuates a
predictive model's performance by splitting a dataset into folds, and training and evaluating the
model on each subset
• Selecting appropriate metrics based on the problem and data characteristics like copy_X, n_jobs
for linear regression, alpha, selection for Lasso Regression and criterion splitter for Decision Tree
Model Comparison
• Comparing the performance of different machine learning models for getting accurate outputs
• Using GridSearchCV to find out the best model among the models used based on accuracy
Feature Importance
• Determining the relative importance of features
• Techniques like coefficients (linear models) or feature importances (tree-based models)
Web Application Development and Deployment
Front-end Development
• Creating user-friendly interfaces using HTML, CSS, and JavaScript
• Designing intuitive layouts and interactive elements
Back-end Integration
• Integrating the trained machine learning model with the web application using Flask framework
• Handling user inputs and generating predictions
Deployment Strategies
• Deploying the web application to the localhost
• Ensuring scalability, security, and reliability
Conclusion
The industrial internship provided a valuable opportunity to apply
theoretical concepts to a real-world problem of house price prediction in
Bengaluru. Through hands-on experience, I have acquired practical skills
in data preprocessing, exploratory data analysis, feature engineering,
and implementing machine learning algorithms like linear regression,
decision trees, and lasso regression.
A key achievement was the development of an accurate linear regression
model for predicting house prices, achieving an impressive 85%
accuracy. This model was successfully deployed as a user-friendly web
application using the Flask framework, allowing users to input relevant
parameters and obtain predicted house prices.
The internship also exposed me to the latest technologies and frameworks
in the field of data analysis and machine learning, such as scikit-learn, data
visualization techniques, and web development tools like HTML, CSS, and
JavaScript.
Overall, the industrial internship proved to be a enriching experience,
enabling me to gain practical skills, industry exposure, and a deeper
understanding of the real estate domain. The knowledge and expertise
acquired during this internship will serve as a strong foundation for future
endeavors in the field of data science and machine learning.

Internship Review Presentation submission

  • 1.
    BCSE399J (Summer IndustrialInternship)/ CBS1902/CSE1902/ CSI3903 (Industrial Internship) Type of Intern: Industry Title of Intern: Machine Learning Internship completed at : <i-RESEAT Conference> , <Best Western Nada Don Mueang Airport Hotel, Bangkok, Thailand> , i-RESEAT , ireseatinter@gmaejo.mju.ac.th Duration : 30 Days : < 21/11/2023 > to < 20/12/2024> Date of Presentation: 04/05/2024 by Aryan Rajesh 20BCE0718 School of Computer Science and Engineering Internship Review - Presentation 1
  • 2.
    Overview (what wasfirstly discovered and why is it a breakthrough) Using housing data from kaggle to build prediction models. The data often includes attributes like square footage, location, number of bedrooms etc. Location data requires special preprocessing as it has an outsized impact on prices. Techniques like one-hot encoding for neighborhoods are common. Identifying and removing outliers also very important. Predictive accuracy in the 70-80% range on unseen test data is considered quite good. Model interpretability also important to understand which factors are most influential. Sometimes insights are extracted - e.g. ranking locations by average price per square foot. These add business value over just predictions.
  • 3.
    Overview (what wasfirstly discovered and why is it a breakthrough) This study addresses house price prediction in Bengaluru using linear and multiple regression techniques. Utilizing a dataset of 1298 unique localities, the research focuses on forecasting land prices in the Bengaluru Metropolitan Area (BMA) in Karnataka, India. Beyond the House Price Index (HPI), factors such as area type, availability, location, society, and apartment size are considered. The goal is to predict the price per square foot for apartments. In metropolitan cities like Bengaluru, determining accurate sales prices remains challenging, making predictive modeling crucial for real estate decision-making. The models aim to capture the complex interplay of these factors in influencing individual house prices in the dynamic real estate market of Bengaluru.
  • 4.
  • 5.
    Understanding House PricePrediction Introduction: • Houses are essential for shelter and livelihood, influencing economic, financial, and political structures. • Fluctuations in house prices pose challenges for stakeholders, including buyers and investors. Importance of Data: • Real estate data analysis aids in predicting market variations and mitigating future losses. • Accurate prediction models are crucial for real estate businesses to make informed decisions. Prediction Models: • Support Vector Regression, Artificial Neural Network, and Bayesian Classifier are commonly used for house price prediction. • These models help stakeholders determine property valuations and make budget-based decisions.
  • 6.
    Factors Influencing HousePrices Key Considerations for Home Buyers: • Location, property size, proximity to amenities, and noise pollution are critical factors. • Factors like air quality and noise pollution significantly impact property prices. Market Trends: • Bengaluru, as a real estate hotspot, has seen shifts in demand and price due to factors like COVID-19 and regulatory changes. • Trust issues with property developers have affected sales and prices in cities like Bengaluru. Machine Learning Approach: • Bayesian Classifier, a supervised learning technique, is utilized for predictive analysis based on various factors.
  • 7.
    House Price PredictionModel Overview Utilization of Technology: • Leveraging data from trusted sources and employing machine learning algorithms for prediction. • Supervised learning techniques like Bayesian Classifier are used for accurate predictions. Application Across Industries: • Predictive models find applications in economics, banking, healthcare, e-commerce, and more. • Algorithms like KNN, Decision Tree, and Regression techniques are employed based on data characteristics and requirements. Ensuring Integrity: • The model aims to provide the best predictions based on gathered data, maintaining system integrity and user trust.
  • 8.
    Aim/Objective The primary aimof the project was to develop an accurate and reliable model for predicting house prices in Bengaluru, India. The specific objectives were: • To explore and analyze the factors that influence house prices in Bengaluru. • To preprocess and clean the dataset to prepare it for modeling. • To implement and evaluate various machine learning algorithms for house price prediction. • To identify the best-performing model and deploy it for practical usage.
  • 9.
    Motivation • The motivationbehind this project was driven by the growing demand for housing in metropolitan cities like Bengaluru and the need for reliable tools to assist home buyers, investors, and real estate developers in making informed decisions. • Accurate house price prediction can help stakeholders determine fair property valuations, identify overpriced or underpriced properties, and make sound investment choices.
  • 10.
    About the Industry The5th International Conference on Renewable Energy, Sustainable Environmental and Agricultural and Artificial Intelligence Technologies (i-RESEAT-2023)"is a Hybrid Mode Conference being organized by Thammasat University (Pathumtani City, Thailand) and co- organized with Maejo University (Chiang Mai City, Thailand), Kaohsiung Medical University (Kaohsiung City, Taiwan), University of Stavanger (Stavanger, Norway), and other supporting partner universities /Institutions around the world. This conference aims to be the premier forum for presenting new breakthroughs and research results in the theoretical, experimental, and practical domains of Energy, Environment, and Agriculture Innovations, and Technologies. It also serves as an excellent international forum for researchers, practitioners, industries and educators to present and discuss the most recent innovations, trends, and concerns, as well as practical challenges encountered and solutions implementation. The conference will bring together world-renowned researchers, engineers, and scientists from around the world in this field of Interest. The main theme of the i-RESEAT-2023 conferences is “Go-Green, Go-Eco, Go- Smart Agri-Tech, and Go-BCG”.
  • 11.
  • 12.
    Skills Acquired duringIndustrial Internship Period During the industrial internship, I have gained valuable skills in various domains, which can be categorized into the following sections: Data Preprocessing and Exploration using Pandas Handling Missing Data • Identifying and addressing missing values in the dataset • Employing techniques like imputation or removal of missing data Dealing with Outliers • Detecting and handling outliers in the data • Using appropriate methods like winsorization or removal of extreme values Data Transformation • Converting data into suitable formats for modeling • Techniques like normalization, scaling, and encoding categorical variables Exploratory Data Analysis (EDA) • Visualizing data using various plots (scatter plots, bar plots, histograms) • Analyzing patterns, trends, and relationships between variables
  • 13.
    Feature Engineering andSelection Relevant Feature Identification • Determining the most influential features for house price prediction • Employing techniques like correlation analysis and feature importance Feature Creation • Generating new features from existing ones • Combining or transforming features to capture additional information Feature Scaling and Encoding • Scaling numerical features for improved model performance • Encoding categorical features for use in machine learning algorithms Machine Learning Modeling Supervised Learning Algorithms • Linear Regression • Decision Trees • Lasso Regression
  • 14.
    Model Evaluation Metrics •Mean Squared Error (MSE) • R-squared • Root Mean Squared Error (RMSE) Hyperparameter Tuning • Techniques like GridSearchCV for optimizing model parameters Web Development and Deployment Flask Framework • Building a web application using the Python Flask framework • Integrating the trained machine learning model for user interaction HTML, CSS and Web Design • Creating user-friendly interfaces for input and output display • Enhancing the website's visual appeal and usability
  • 15.
    New Technologies/Frameworks/Real-time Problems/Analysis-based KnowledgeAcquired Introduction to Machine Learning Frameworks Scikit-learn Library • Utilizing the powerful Scikit-learn library in Python • Implementing various machine learning algorithms and preprocessing techniques TensorFlow or PyTorch (if applicable) • Exposure to deep learning frameworks like TensorFlow or PyTorch • Understanding the potential of deep learning for complex problems
  • 16.
    Real-world Problem: HousePrice Prediction Understanding the Importance • Recognizing the significance of accurate house price prediction • Implications for stakeholders (buyers, sellers, investors, developers) Influencing Factors • Identifying key factors that impact house prices • Location, area, number of rooms, amenities, neighborhood characteristics Data Acquisition and Preprocessing • Sourcing relevant datasets for house price prediction • Cleaning, transforming, and preparing data for modeling
  • 17.
    Exploratory Data Analysisand Visualization Univariate Analysis • Analyzing the distribution and characteristics of individual features • Identifying outliers, skewness, and central tendencies Bivariate Analysis • Exploring relationships between pairs of features • Scatter plots, correlation matrices, and other visualizations Multivariate Analysis • Investigating interactions among multiple features • Techniques like principal component analysis (PCA) or t-SNE
  • 18.
    Model Evaluation andInterpretation Evaluation Metrics • Understanding and interpreting evaluation metrics and using K-Fold Cross Validation valuates a predictive model's performance by splitting a dataset into folds, and training and evaluating the model on each subset • Selecting appropriate metrics based on the problem and data characteristics like copy_X, n_jobs for linear regression, alpha, selection for Lasso Regression and criterion splitter for Decision Tree Model Comparison • Comparing the performance of different machine learning models for getting accurate outputs • Using GridSearchCV to find out the best model among the models used based on accuracy Feature Importance • Determining the relative importance of features • Techniques like coefficients (linear models) or feature importances (tree-based models)
  • 19.
    Web Application Developmentand Deployment Front-end Development • Creating user-friendly interfaces using HTML, CSS, and JavaScript • Designing intuitive layouts and interactive elements Back-end Integration • Integrating the trained machine learning model with the web application using Flask framework • Handling user inputs and generating predictions Deployment Strategies • Deploying the web application to the localhost • Ensuring scalability, security, and reliability
  • 20.
    Conclusion The industrial internshipprovided a valuable opportunity to apply theoretical concepts to a real-world problem of house price prediction in Bengaluru. Through hands-on experience, I have acquired practical skills in data preprocessing, exploratory data analysis, feature engineering, and implementing machine learning algorithms like linear regression, decision trees, and lasso regression. A key achievement was the development of an accurate linear regression model for predicting house prices, achieving an impressive 85% accuracy. This model was successfully deployed as a user-friendly web application using the Flask framework, allowing users to input relevant parameters and obtain predicted house prices.
  • 21.
    The internship alsoexposed me to the latest technologies and frameworks in the field of data analysis and machine learning, such as scikit-learn, data visualization techniques, and web development tools like HTML, CSS, and JavaScript. Overall, the industrial internship proved to be a enriching experience, enabling me to gain practical skills, industry exposure, and a deeper understanding of the real estate domain. The knowledge and expertise acquired during this internship will serve as a strong foundation for future endeavors in the field of data science and machine learning.