SlideShare a Scribd company logo
1 of 103
lOMoAR cPSD|24598226
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
Car Price prediction final pdf
Computer science (AJ Institute of Engineering and Technology)
Studocu is not sponsored or endorsed by any college or university
lOMoAR cPSD|24598226
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
A Dinesh
A Rahul
(19BD1A05C1)
(19BD1A05C5)
E Sri Kumar (19BD1A05CJ)
G Pranav (19BD1A05CK)
Under the guidance of
Ms. NASREEN SULTANA
Assistant Professor
Department of CSE
A Mini Project Report on
CAR PRICE PREDICTION USING LINEAR REGRESSION
Submitted to
Jawaharlal Nehru Technological University, Hyderabad
in partial fulfillment of requirements for the award of the degree of
BACHELOR OF TECHNOLOGY
In
COMPUTER SCIENCE AND ENGINEERING
By
Department of Computer Science and Engineering
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
Approved by AICTE, Affiliated to JNTUH
3-5-1206, Narayanaguda, Hyderabad – 500029
2022-2023
lOMoAR cPSD|24598226
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY
(Accredited by NBA & NAAC, Approved By A.I.C.T.E., Reg by Govt of Telangana
State & Affiliated to JNTU, Hyderabad)
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
CERTIFICATE
This is to certify that the project entitled CAR PRICE PREDICTION USING LINEAR
REGRESSION being submitted by
A. Dinesh
A. Rahul
E. Sri Kumar
G. Pranav
(19BD1A05C1)
(19BD1A05C5)
(19BD1A05CJ)
(19BD1A05CK)
In partial fulfilment for the award of Bachelor of Technology in Computer Science and Engineering
affiliated to the Jawaharlal Nehru Technological University, Hyderabad during the year 2022-23.
Internal Guide Head of the Department
(Ms. Nasreen Sultana) (Dr. S. Padmaja)
Submitted for Viva Voice Examination held on
External Examiner
Unit of Keshav Memorial Educational Society
#: 3-5-1026 Narayanaguda Hyderabad 500029.
040-3261407 www.kmit.in e-mail: principal@kmit.in
lOMoAR cPSD|24598226
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
Vision of KMIT
Producing quality graduates trained in the latest technologies and related tools and striving to make India a
world leader in software and hardware products and services. To achieve academic excellence by imparting
in depth knowledge to the students, facilitating research activities and catering to the fast growing and ever-
changing industrial demands and societal needs.
Mission of KMIT
 To provide a learning environment that inculcates problem solving skills, professional, ethical
responsibilities, lifelong learning through multi modal platforms and prepare students to become
successful professionals.
 To establish industry institute Interaction to make students ready for the industry.
 To provide exposure to students on latest hardware and software tools.
 To promote research based projects/activities in the emerging areas of technology convergence.
 To encourage and enable students to not merely seek jobs from the industry but also to create new
enterprises.
 To induce a spirit of nationalism which will enable the student to develop, understand lndia's
challenges and to encourage them to develop effective solutions.
 To support the faculty to accelerate their learning curve to deliver excellent service to students.
lOMoAR cPSD|24598226
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
Vision & Mission of CSE
Vision of the CSE
To be among the region's premier teaching and research Computer Science and Engineering departments
producing globally competent and socially responsible graduates in the most conducive academic
environment.
Mission of the CSE
 To provide faculty with state of the art facilities for continuous professional development and
research, both in foundational aspects and of relevance to emerging computing trends.
 To impart skills that transform students to develop technical solutions for societal needs and
inculcate entrepreneurial talents.
 To inculcate an ability in students to pursue the advancement of knowledge in various specializations
of Computer Science and Engineering and make them industry-ready.
 To engage in collaborative research with academia and industry and generate adequate resources for
research activities for seamless transfer of knowledge resulting in sponsored projects and
consultancy.
 To cultivate responsibility through sharing of knowledge and innovative computing solutions that
benefit the society-at-large.
 To collaborate with academia, industry and community to set high standards in academic excellence
and in fulfilling societal responsibilities.
lOMoAR cPSD|24598226
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
PROGRAM OUTCOMES (POs)
1. Engineering Knowledge: Apply the knowledge of mathematics, science, engineering fundamentals and
an engineering specialization to the solution of complex engineering problems.
2. Problem analysis: Identify formulate, review research literature, and analyse complex engineering
problems reaching substantiated conclusions using first principles of mathematics, natural sciences, and
engineering sciences
3. Design/development of solutions: Design solutions for complex engineering problem and design system
component or processes that meet the specified needs with appropriate consideration for the public health
and safety, and the cultural societal, and environmental considerations.
4. Conduct investigations of complex problems: Use research-based knowledge and research methods
including design of experiments, analysis and interpretation of data, and synthesis of the information to
provide valid conclusions.
5. Modern tool usage: Create select, and, apply appropriate techniques, resources, and modern engineering
and IT tools including prediction and modelling to complex engineering activities with an understanding of
the limitations.
6. The engineer and society: Apply reasoning informed by the contextual knowledge to societal, health,
safety. legal und cultural issues and the consequent responsibilities relevant to professional engineering
practice.
7. Environment and sustainability: Understand the impact of the professional engineering solutions in
societal and environmental contexts and demonstrate the knowledge of, and need for sustainable
development.
8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms of the
engineering practice.
9. Individual and team work: Function effectively as an individual, and as a member or leader in diverse
teams and in multidisciplinary settings.
10. Communication: Communicate effectively on complex engineering activities with the engineering
community and with society at large, such as, being able to comprehend and write effective reports and
design documentation make effective presentations, and give and receive clear instructions.
11. Project management and finance: Demonstrate knowledge and understanding of the engineering and
management principles and apply these to one's own work, as a member and leader in a team, to manage
projects and in multidisciplinary environments.
12. Life-long learning: Recognize the need for, and have the preparation and ability to engage in
independent and life-long learning in the broadest context of technological change.
lOMoAR cPSD|24598226
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
PROGRAM SPECIFIC OUTCOMES (PSOs)
PSO1: An ability to analyse the common business functions to design and develop appropriate Information
Technology solutions for social upliftment.
PSO2: Shall have expertise on the evolving technologies like Python, Machine Learning, Deep Learning, Internet of
Things (IOT), Data Science, Full stack development, Social Networks, Cyber Security, Big Data, Mobile Apps, CRM,
ERP etc..
PROGRAM EDUCATIONAL OBJECTIVES (PEOs)
PEO1: Graduates will have successful careers in computer related engineering fields or will be able to
successfully pursue advanced higher education degrees.
PEO2: Graduates will try and provide solutions to challenging problems in their profession by applying
computer engineering principles.
PEO3: Graduates will engage in life-long learning and professional development by rapidly adapting
changing work environment.
PEO4: Graduates will communicate effectively, work collaboratively and exhibit high levels of
professionalism and ethical responsibility.
lOMoAR cPSD|24598226
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
PROJECT OUTCOMES
P1: To provide a friendly environment to the user
P2: To predict dependent variable for given user input data(features).
P3: To give the accurate price for used cars.
P4: Developing web applications using flask-framework.
LOW - 1
MEDIUM - 2
HIGH - 3
PROJECT OUTCOMES MAPPING PROGRAM OUTCOMES
PO PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12
P1
3 3 2 2
P2
2 2 2 2 2 1
P3
2 2 3 2 2 2
P4
1 2 3 2 2 1
PROJECT OUTCOMES MAPPING PROGRAM SPECIFIC OUTCOMES
PSO PSO1 PSO2
P1
1
P2
3
lOMoAR cPSD|24598226
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
P3
2 3
P4
1 3
PROJECT OUTCOMES MAPPING PROGRAM EDUCATIONAL OBJECTIVES
PEO PEO1 PEO2 PEO3 PEO4
P1
1 2
P2
2
P3
1 2
P4
1 2 2
lOMoAR cPSD|24598226
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
DECLARATION
We hereby declare that the project report entitled ―CAR PRICE PREDICTION USING
LINEAR REGRESSION (M.L)” is done in the partial fulfillment for the award of the Degree in
Bachelor of Technology in Computer Science and Engineering affiliated to Jawaharlal Nehru
Technological University, Hyderabad. This project has not been submitted anywhere else.
ABBI DINESH (19BD1A05C1)
ASAD RAHUL (19BD1A05C5)
ERRAGALA SRI KUMAR (19BD1A05CJ)
GYARA PRANAV KUMAR (19BD1A05CK)
lOMoAR cPSD|24598226
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
ACKNOWLEDGMENT
We take this opportunity to thank all the people who have rendered their full support
to our project work.
We render our thanks to Dr. Maheshwar Dutta, B.E., M Tech., Ph.D., Principal who
encouraged us to do the Project.
We are grateful to Mr. Neil Gogte, Director for facilitating all the amenities required
for carrying out this project.
We express our sincere gratitude to Mr. S. Nitin, Director and Dr. D. Jaya Prakash,
Dean Academics for providing an excellent environment in the college.
We are also thankful to Dr. S. Padmaja, Head of the Department for providing us
with both time and amenities to make this project a success within the given schedule.
We are also thankful to our guide Ms. Nasreen Sultana, for her valuable guidance
and encouragement given to us throughout the project work.
We would like to thank the entire CSE Department faculty, who helped us directly
and indirectly in the completion of the project. We sincerely thank our friends and family
for their constant motivation during the project work.
ABBI DINESH (19BD1A05C1)
ASAD RAHUL (19BD1A05C5)
ERRAGALA SRI KUMAR (19BD1A05CJ)
GYARA PRANAV KUMAR (19BD1A05CK)
lOMoAR cPSD|24598226
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
CONTENT
DESCRIPTION PAGE NO.
ABSTRACT i
LIST OF FIGURES ii
LIST OF TABLES iii
CHAPTERS
1. INTRODUCTION 1-14
1.1. Machine Learning 1
1.2. What is Machine Learning 1
1.3. Types of Machine Learning 3
1.4. Linear Regression 8
1.5. Objective & Problem Statement 13
1.6. Purpose of Project 13
1.7. Architecture Diagram 14
1.8. Project Goal 14
2. SOFTWARE
REQUIREMENTS
SPECIFICATIONS
15-16
2.1. Requirements Specification Document 16
2.2. Functional Requirements 17
2.3. Non-Functional Requirements 17
2.4. Software Requirements 18
2.5. Hardware Requirements 18
2.6. Requirement Analysis 19
2.7. Test Construction and verification 20
2.8. Test Execution and Bug Reporting 20
2.9. Final Testing and Implementation 20
2.10. Post Implementation 20
2.11. Technologies used 21
lOMoAR cPSD|24598226
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
CAR PRICE PREDICTION
3. LITERATURE SURVEY
3.1. Proposed Model
3.2. Paper Work
3.3. Related Work
24-27
25
26
27
4. SYSTEM DESIGN 28-33
4.1. Introduction to UML 29
4.2. UML Diagrams 29
4.2.1. Use Case diagram 29
4.2.2. Sequence diagram 31
4.2.3. Class diagram 33
4.2.4. System Design 34
4.2.5. State Chart Diagram 36
5. IMPLEMENTATION 38-59
5.1. Pseudo code 39
5.2. Data Cleaning using Google Colab 40
5.2. Code Snippets 52
6. TESTING 60-72
6.1. Introduction to Testing 61
6.2. Test Cases 63
7. SCREENSHOTS 73-75
7.1. Layout of Testing Platform 74
7.2. Log & Reference 74
7.3. UI of Web Application 75
8.FURTHER ENHANCEMENTS 76
9.CONCLUSION 78
10.REFERENCES 80
lOMoAR cPSD|24598226
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
ABSTRACT
In this fast-moving generation, the present study proposes the newer concept of
predicting the prices of certain items. With an idea and motivation to help everyone we
came up with a solution to get an appropriate estimate of one’s car using Machine
Learning Techniques which will save a lot of time and money. A car price prediction has
been a high interest research area, as it requires noticeable effort and knowledge of the
field expert. Considerable number of distinct attributes is examined for the reliable and
accurate prediction. The production of cars has been steadily increasing in the past
decade, with over 70 million passenger cars being produced in the year 2016. This has
given rise to the used car market, which on its own has become a booming industry. The
recent advent of online portals has facilitated the need for both the customer and the
seller to be better informed about the trends and patterns that determine the value of a
used car in the market. To build a model for predicting the price of used cars in, we
applied one of the machine learning techniques i.e., Linear Regression. Using linear
regression, there are multiple independent variables, but one and only one dependent
variable whose actual and predicted values are compared to find precision of results. Our
paper proposes a system where price is dependent variable which is predicted, and this
price is derived from factors like kilometers driven, car purchase year, Car Company, car
model, and the fuel type.
Keywords: Car Price Prediction, Linear Regression, Machine Learning, dependent
variable etc.
lOMoAR cPSD|24598226
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
LIST OF FIGURES
LIST OF FIGURES PAGE NO
1.1 Machine Learning 1
1.2 Machine Learning & Traditional
Programming
2
1.3 Types of Machine Learning 3
1.3.1 Data Set of Supervised Learning 3
1.3.1.2 Types of Supervised Learning 4
1.3.2 Unsupervised 5
1.3.2.1 Types of Unsupervised Learning 6
1.3.4 Reinforcement Learning 7
1.4 Linear Regression 8
1.7 Architecture of Linear Regression’ 14
3.8.1 Google colab 22
4.2.1 Use Case Diagram -UML 30
4.2.2 Sequence Diagram –UML 32
4.2.3 Class Diagram –UML 33
4.2.4 System Design-UML 35
4.2.5 State Chart Diagram –UML 37
7.1 Selenium IDE Testing Platform 74
7.2 Log & Reference using Selenium IDE 74
lOMoAR cPSD|24598226
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
CAR PRICE PREDICTION
7.3 Register page of Web Application -UI 75
7.4 Login page of Web Application -UI 75
7.5 Home page of Web Application-UI 76
7.6 Displaying available car companies -UI 76
7.7 Displaying suitable car models -UI 77
7.8 Displaying available years -UI 77
7.9 Displaying available Fuel Types- UI 78
7.10 Displaying Predicted Price -UI 78
lOMoAR cPSD|24598226
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
LIST OF TABLES
6.2 Test Case for Web Application 62
6.2.1 Launching web application 62
6.2.2 Registration of user details 64
6.2.3 Login Positive test case 65
6.2.4 Login Negative test case 66
6.2.5 Displaying Attributes 66
6.2.6 Selecting Attributes 68
6.2.7 Selecting attributes for correct attributes 69
6.2.8 Selecting attributes for incorrect attributes 70
6.2.9 Home button Test case 71
6.2.10 Logout button Test case 72
lOMoAR cPSD|24598226
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
CHAPTER -1
lOMoAR cPSD|24598226
CAR PRICE PREDICTION
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
Figure-1.1 Machine Learning
Machine Learning?
1. INTRODUCTION
1.1 MACHINE LEARNING
Machine Learning is the field of study that gives computers the capability to learn without
being explicitly programmed. ML is one of the most exciting technologies that one would have
ever come across. As it is evident from the name, it gives the computer that makes it more
similar to humans: The ability to learn. Machine learning is actively being used today, perhaps in
many more places than one would expect.
1.2 What is
Arthur Samuel, a pioneer in the field of artificial intelligence and computer gaming, coined
the term ―Machine Learning‖. He defined machine learning as – a ―Field of study that gives
computers the capability to learn without being explicitly programmed‖. In a very layman’s
manner, Machine Learning (ML) can be explained as automating and improving the learning
process of computers based on their experiences without being actually programmed i.e. without
any human assistance. The process starts with feeding good quality data and then training our
machines(computers) by building machine learning models using the data and different
algorithms. The choice of algorithms depends on what type of data do we have and what kind of
task we are trying to automate. Example: Training of students during exams. While preparing
for the exams students don’t actually cram the subject but try to learn it with complete
understanding. Before the examination, they feed their machine(brain) with a good amount of
high-quality data (questions and answers from different books or teachers’ notes, or online video
lectures).
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 1
lOMoAR cPSD|24598226
CAR PRICE PREDICTION
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
Figure 1.2 Machine Learning & Traditional Programming
Actually, they are training their brain with input as well as output i.e, what kind of approach or
logic do they have to solve a different kinds of questions. Each time they solve practice test
papers and find the performance (accuracy /score) by comparing answers with the answer key
given, Gradually, the performance keeps on increasing, gaining more confidence with the adopted
approach. That’s how actually models are built, train machine with data (both inputs and outputs
are given to the model), and when the time comes test on data (with input only) and achieve our
model scores by comparing its answer with the actual output which has not been fed while
training. Researchers are working with assiduous efforts to improve algorithms, and techniques so
that these models perform even much better.
1.2.1 Basic Difference in ML and Traditional Programming?
Traditional Programming: We feed in DATA (Input) + PROGRAM (logic), run it on
the machine, and get the output.
Machine Learning: We feed in DATA (Input) + Output, run it on the machine during
training and the machine creates its own program (logic), which can be evaluated while
testing.
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 2
lOMoAR cPSD|24598226
CAR PRICE PREDICTION
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
1.3 ML | Types of Machine Learning
A machine is said to be learning from past experiences (data feed-in) with respect
to some class of tasks if its Performance in a given Task improves with the Experience.
For example, assume that a machine has to predict whether a customer will buy a specific
product let’s say ―Antivirus‖ this year or not. The machine will do it by looking at the
previous knowledge/past experiences i.e the data of products that the customer had
bought every year and if he buys Antivirus every year, then there is a high probability
that the customer is going to buy an antivirus this year as well. This is how machine
learning works at the basic conceptual level.
Figure 1.3 Types of Machine Learning
1.3.1 Supervised Learning
Supervised learning is when the model is getting trained on a labeled dataset. A labeled
dataset is one that has both input and output parameters. In this type of learning training
and validation, datasets are labeled as shown in the figures below.
Example
Figure 1.3.1 Data Set
Both the above figures have labeled data set as follows:
Figure A: It is a dataset of a shopping store that is useful in predicting whether a
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 3
lOMoAR cPSD|24598226
CAR PRICE PREDICTION
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
customer will purchase a particular product under consideration or not based on his/ her
gender, age, and salary.
Input: Gender, Age, Salary
Output: Purchased i.e. 0 or 1; 1 means yes the customer will purchase and 0 means that
the customer won’t purchase it.
Figure B: It is a Meteorological dataset that serves the purpose of predicting wind speed
based on different parameters.
Input: Dew Point, Temperature, Pressure, Relative Humidity, Wind Direction
Output: Wind Speed
1.3.1 Types of Supervised Learning:
A. Classification:
Figure 1.3.1 Types of Supervised Learning
It is a Supervised Learning task where output is having defined labels (discrete
value). For example in above Figure A, Output – Purchased has defined labels i.e. 0 or 1;
1 means the customer will purchase, and 0 means that the customer won’t purchase. The
goal here is to predict discrete values belonging to a particular class and evaluate them on
the basis of accuracy.
It can be either binary or multi-class classification. In binary classification, the model
predicts either 0 or 1; yes or no but in the case of multi-class classification, the model
predicts more than one class. Example: Gmail classifies mails in more than one class like
social, promotions, updates, and forums.
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 4
lOMoAR cPSD|24598226
CAR PRICE PREDICTION
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
B. Regression:
It is a Supervised Learning task where output is having continuous value.
For example in above Figure B, Output – Wind Speed is not having any discrete value
but is continuous in a particular range. The goal here is to predict a value as much closer
to the actual output value as our model can and then evaluation is done by calculating the
error value. The smaller the error, the greater the accuracy of our regression model.
Example of Supervised Learning Algorithms:
 Linear Regression
 Logistic Regression
 Nearest Neighbor
 Gaussian Naive Bayes
 Decision Trees
 Support Vector Machine (SVM)
 Random Forest
1.3.2 Unsupervised Learning:
Unsupervised machine learning analyzes and clusters unlabeled datasets using
machine learning algorithms. These algorithms find hidden patterns and data without any
human intervention, i.e., we don’t give output to our model. The training model has only
input parameter values and discovers the groups or patterns on its own. Data-set in
Figure A is Mall data that contains information about its clients that subscribe to them.
Once subscribed they are provided a membership card and the mall has complete
information about the customer and his/her every purchase. Now using this data and
unsupervised learning techniques, the mall can easily group clients based on the
parameters we are feeding in.
Figure 1.3.2 Unsupervised Learning
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 5
lOMoAR cPSD|24598226
CAR PRICE PREDICTION
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
The input to the unsupervised learning models is as follows:
Unstructured data: May contain noisy (meaningless) data, missing values, or unknown
data
1.3.2.1 Types of Unsupervised Learning are as follows:
Figure 1.3.2.1 Types of Unsupervised
Clustering: Broadly this technique is applied to group data based on different patterns,
such as similarities or differences, our machine model finds. These algorithms are used to
process raw, unclassified data objects into groups. For example, in the above figure, we
have not given output parameter values, so this technique will be used to group clients
based on the input parameters provided by our data.
Association: This technique is a rule-based ML technique that finds out some very useful
relations between parameters of a large data set. This technique is basically used for
market basket analysis that helps to better understand the relationship between different
products. For e.g. shopping stores use algorithms based on this technique to find out the
relationship between the sale of one product w.r.t to another’s sales based on customer
behavior. Like if a customer buys milk, then he may also buy bread, eggs, or butter. Once
trained well, such models can be used to increase their sales by planning different offers.
Some algorithms: K-Means Clustering
DBSCAN – Density-Based Spatial Clustering of Applications with Noise
BIRCH – Balanced Iterative Reducing and Clustering using Hierarchies Hierarchical
Clustering
1.3.3 Semi-supervised Learning:
As the name suggests, its working lies between Supervised and Unsupervised
techniques. We use these techniques when we are dealing with data that is a little bit
labeled and the rest large portion of it is unlabeled. We can use the unsupervised
techniques to predict labels and then feed these labels to supervised techniques.
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 6
lOMoAR cPSD|24598226
CAR PRICE PREDICTION
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
This technique is mostly applicable in the case of image data sets where usually all
images are not labeled.
1.3.4 Reinforcement Learning:
In this technique, the model keeps on increasing its performance using Reward
Feedback to learn the behavior or pattern. These algorithms are specific to a particular
problem e.g. Google Self Driving car, AlphaGo where a bot competes with humans and
even itself to get better and better performers in Go Game. Each time we feed in data,
they learn and add the data to their knowledge which is training data. So, the more it
learns the better it gets trained and hence experienced.
Figure1.3.4 Reinforcement
 Agents observe input.
 An agent performs an action by making some decisions.
 After its performance, an agent receives a reward and accordingly reinforces and
the model
 stores in state-action pair of information.
 Temporal Difference (TD)
 Q-Learning and Deep Adversarial Networks.
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 7
lOMoAR cPSD|24598226
CAR PRICE PREDICTION
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
1.4 ML | Linear Regression
In statistics, linear regression is a linear approach for modelling the relationship
between a scalar response and one or more explanatory variables (also known as
dependent and independent variables). The case of one explanatory variable is called
simple linear regression; for more than one, the process is called multiple linear
regression. This term is distinct from multivariate linear regression, where multiple
correlated dependent variables are predicted, rather than a single scalar variable.
In linear regression, the relationships are modeled using linear predictor functions whose
unknown model parameters are estimated from the data. Such models are called linear
models. Most commonly, the conditional mean of the response given the values of the
explanatory variables (or predictors) is assumed to be an affine function of those values;
less commonly, the conditional median or some other quantile is used. Like all forms of
regression analysis, linear regression focuses on the conditional probability distribution
of the response given the values of the predictors, rather than on the joint probability
distribution of all of these variables, which is the domain of multivariate analysis.
Linear regression was the first type of regression analysis to be studied rigorously, and to
be used extensively in practical applications. This is because models which depend
linearly on their unknown parameters are easier to fit than models which are non-linearly
related to their parameters and because the statistical properties of the resulting estimators
are easier to determine.
Figure 1.4 Linear Regression
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 8
lOMoAR cPSD|24598226
CAR PRICE PREDICTION
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
i=1
i
Linear regression has many practical uses. Most applications fall into one of the
following two broad categories:
 If the goal is prediction, forecasting, or error reduction,[clarification needed] linear
regression can be used to fit a predictive model to an observed data set of values of
the response and explanatory variables. After developing such a model, if additional
values of the explanatory variables are collected without an accompanying response
value, the fitted model can be used to make a prediction of the response.
 If the goal is to explain variation in the response variable that can be attributed to
variation in the explanatory variables, linear regression analysis can be applied to
quantify the strength of the relationship between the response and the explanatory
variables, and in particular to determine whether some explanatory variables may
have no linear relationship with the response at all, or to identify which subsets of
explanatory variables may contain redundant information about the response.
Linear regression models are often fitted using the least squares approach, but they may
also be fitted in other ways, such as by minimizing the "lack of fit" in some other norm
(as with least absolute deviations regression), or by minimizing a penalized version of the
least squares cost function as in ridge regression (L2-norm penalty) and lasso (L1-norm
penalty). Conversely, the least squares approach can be used to fit models that are not
linear models. Thus, although the terms "least squares" and "linear model" are closely
linked, they are not synonymous.
Given a data set *𝑦i𝑥i1, . . . 𝑥i𝑝+ of n statistical units, a linear regression model assumes
that the relationship between the dependent variable y and the p-vector of regressors x is
linear. This relationship is modeled through a disturbance term or error variable ε — an
unobserved random variable that adds "noise" to the linear relationship between the
dependent variable and regressors. Thus the model takes the form
𝑦i = 𝛽0 + 𝛽1𝑥i1+ . . . + 𝛽𝑝𝑥i𝑝 + si = 𝑥𝑇𝛽 + si, i =1, …n
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 9
lOMoAR cPSD|24598226
CAR PRICE PREDICTION
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
where T denotes the transpose, so that xiTβ is the inner product between vectors xi and β.
Often these n equations are stacked together and written in matrix notation as
𝑦 = 𝑥𝛽 + s,
The very simplest case of a single scalar predictor variable x and a single scalar response
variable y is known as simple linear regression. The extension to multiple and/or vector-
valued predictor variables (denoted with a capital X) is known as multiple linear
regression, also known as multivariable linear regression (not to be confused with
multivariate linear regression.
Multiple linear regression is a generalization of simple linear regression to the case of
more than one independent variable, and a special case of general linear models,
restricted to one dependent variable. The basic model for multiple linear regression is
𝑌i = 𝛽0 + 𝛽1𝑥i1+.... 𝛽𝑝𝑥i𝑝 + si
for each observation i = 1, ..., n.
In the formula above we consider n observations of one dependent variable and p
independent variables. Thus, Yi is the ith observation of the dependent variable, Xij is ith
observation of the jth independent variable, j = 1, 2, ..., p. The values βj represent
parameters to be estimated, and εi is the ith independent identically distributed normal
error.
In the more general multivariate linear regression, there is one equation of the above
form for each of m > 1 dependent variables that share the same set of explanatory
variables and hence are estimated simultaneously with each other:
𝑌ij = 𝛽0j + 𝛽1j𝑥i1+ .... 𝛽𝑝j𝑥i𝑝 + sij
for all observations indexed as i = 1,.... , n and for all dependent variables indexed as j =
1, ...., m.
Nearly all real-world regression models involve multiple predictors, and basic
descriptions of linear regression are often phrased in terms of the multiple regression
model. Note, however, that in these cases the response variable y is still a scalar. Another
term, multivariate linear regression, refers to cases where y is a vector, i.e., the same as
general linear regression.
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 10
lOMoAR cPSD|24598226
CAR PRICE PREDICTION
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
1
∑
𝑁
1.4.1 Type of loss in a linear model:
L1 loss: This is the difference between the predicted and actual values. It is also called
mean absolute error (MAE).
The model will calculate all the MAE values and add them to find the total L1 Loss. The
formula of L1 loss is shown below.
𝑀𝐴𝐸 =
1
∑ |
𝑦 − 𝑦
̂|
𝑁 i=1 i
where, 𝑦
̂ i𝑠 𝑝𝑟e𝑑i𝑐𝑡e𝑑 𝑣𝑎𝑙𝑢e of 𝑦
𝑦 i𝑠 𝑚e𝑎𝑛 𝑣𝑎𝑙𝑢e of 𝑦
L2 Loss: In this loss, we take the squared average difference between the predicted and
actual value. It is also known as Mean Squared Error (MSE). The formula of L2 loss is
shown below.
𝑀𝑆𝐸 =
1
∑𝑁 (
𝑦 − 𝑦
̂)2
𝑁 i=1 i
where, 𝑦
̂ i𝑠 𝑝𝑟e𝑑i𝑐𝑡e𝑑 𝑣𝑎𝑙𝑢e of 𝑦
𝑦 i𝑠 𝑚e𝑎𝑛 𝑣𝑎𝑙𝑢e of 𝑦
RSME Error: It tells the error rate by the square root of the L2 loss i.e. MSE. The
formula of RSME is shown below.
𝑅𝑆𝑀𝐸 = √𝑀𝑆𝐸 = √ (𝑦 − 𝑦
̂)2
𝑁 i=1 i
Where, 𝑦
̂ i𝑠 𝑝𝑟e𝑑i𝑐𝑡e𝑑 𝑣𝑎𝑙𝑢e of 𝑦
𝑦 i𝑠 𝑚e𝑎𝑛 𝑣𝑎𝑙𝑢e of 𝑦
R-squared error: It tells the good fit of the model-predicted line with the actual values
of data. The coefficient value range is from 0 to 1 i.e. the value close to 1 is a well-fitted
line. The formula is shown below.
𝑅2
= 1 −
∑(𝑦i−𝑦
̂)2
∑(𝑦i−𝑦)2
Where, 𝑦
̂ i𝑠 𝑝𝑟e𝑑i𝑐𝑡e𝑑 𝑣𝑎𝑙𝑢e of 𝑦
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 11
lOMoAR cPSD|24598226
CAR PRICE PREDICTION
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
𝑦 i𝑠 𝑚e𝑎𝑛 𝑣𝑎𝑙𝑢e of 𝑦
Note: In the case of an outlier, we can use L1 losses because with L2 loss the error is
being squared to give more loss value. We can remove the outlier from the first and then
can use L2 loss.
Learning Rate:
The alpha is the learning rate in the gradient descent formula as we seen above. It
functions of the alpha to control the speed of the gradient descent to get the minima point.
The value of alpha should be optimal so that it won’t miss the minima point or take time
to reach the minima point.
∂𝐿
𝜃𝑛ew = 𝜃o𝑙𝑑 − 𝛼
o𝑙𝑑
1.4.2 Gradient Descent:
To update θ1 and θ2 values in order to reduce Cost function (minimizing RMSE
value) and achieving the best fit line the model uses Gradient Descent. The idea is to start
with random θ1 and θ2 values and then iteratively updating the values, reaching
minimum cost.
1.4.3 One Hot Encoding:
Most Machine Learning algorithms cannot work with categorical data and needs
to be converted into numerical data. Sometimes in datasets, we encounter columns that
contain categorical features (string values) for example parameter Gender will have
categorical parameters like Male, Female. These labels have no specific order of
preference and also since the data is string labels, machine learning models
misinterpreted that there is some sort of hierarchy in them.
One approach to solve this problem can be label encoding where we will assign a
numerical value to these labels for example Male and Female mapped to 0 and 1. But this
can add bias in our model as it will start giving higher preference to the Female parameter
as 1>0 and ideally both labels are equally important in the dataset. To deal with this issue
we will use One Hot Encoding technique.
In this technique, the categorical parameters will prepare separate columns for both Male
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 12
∂𝜃
lOMoAR cPSD|24598226
CAR PRICE PREDICTION
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
and Female labels. So, wherever there is Male, the value will be 1 in Male column and 0
in Female column, and vice-versa. Let’s understand with an example: Consider the data
where fruits and their corresponding categorical values and prices are given.
1.5 Objective & Problem Statement
Objective Of the Project - The goal of this project is to create an efficient and
effective model that will be able to predict the price of a used car by using the Linear
Regression algorithm with better accuracy.
 Brand or Type of the car one prefers like Ford, Hyundai
 Model of the car namely Ford Figo, Hyundai Creta
 Year of manufacturing like 2020, 2021
 Type of fuel namely Petrol, Diesel
 Number of kilometers car has travelled
Problem Statement - It is easy for any company to price their new cars based on the
manufacturing and marketing cost it involves. But when it comes to a used car it is quite
difficult to define a price because it involves it is influenced by various parameters like
car brand, manufactured year and etc. The goal of our project is to predict the best price
for a pre-owned car in the Indian market based on the previous data related to sold cars
using Linear Regression.
1.6 Purpose of Project
The used car market is an ever-rising industry, which has almost doubled its market
value in the last few years. The emergence of online portals such as CarDheko, Quikr,
Carwale, Cars24, and many others has facilitated the need for both the customer and the
seller to be better informed about the trends and patterns that determine the value of the
used car in the market. Machine Learning algorithms can be used to predict the retail
value of a car, based on a certain set of features. The purpose of this project is to provide
Car price prediction using machine learning without any human interference.
In our day to day lives everyone buys and sells a car every day. Now there are
limited facilities and applications to get an appropriate price for one’s car. Now we use
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 13
lOMoAR cPSD|24598226
CAR PRICE PREDICTION
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
this application to get an estimate value of the car.
1.7 Architecture Diagram
Fig 1.7 – Architecture of Linear Regression (M.L)
1.8 Project Goal
We are required to model the price of cars with the available independent
variables. It will be used by the management to understand how exactly the prices vary
with the independent variables. They can accordingly manipulate the design of the cars,
the business strategy etc. to meet certain price levels. Further, the model will be a good
way for management to understand the pricing dynamics of a new market.
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 14
lOMoAR cPSD|24598226
CAR PRICE PREDICTION
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 15
CHAPTER -2
lOMoAR cPSD|24598226
CAR PRICE PREDICTION
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
2. SYSTEM REQUIREMENT SPECIFICATIONS
2.1What is SRS?
Software Requirement Specification (SRS) is the starting point of the software
developing activity. As system grew more complex it became evident that the goal of
the entire system cannot be easily comprehended. Hence the need for the requirement
phase arose. The software project is initiated by the client needs. The SRS is the means
of translating the ideas of the minds of clients (the input) into a formal document
(theoutput of the requirement phase.)
The SRS phase consists of two basic activities:
Problem/Requirement Analysis:
The process is order and more nebulous of the two, deals with understand the
problem,the goal and constraints.
Requirement Specification:
Here, the focus is on specifying what has been found giving analysis such as
representation, specification languages and tools, and checking the specifications are
addressed during this activity.
The Requirement phase terminates with the production of the validate SRS
document. Producing the SRS document is the basic goal of this phase.
2.1.1 Role of SRS:
The purpose of the Software Requirement Specification is to reduce the
communication gap between the clients and the developers. Software Requirement
Specification is the medium though which the client and user needs are
accurately specified. It forms the basis of software development. A good SRS should
satisfy all the parties involved in the system.
2.2Requirements Specification Document
A Software Requirements Specification (SRS) is a document that describes the
nature of a project, software or application. In simple words, SRS document is a manual
of a project provided it is prepared before you kick-start a project/application. This
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 16
lOMoAR cPSD|24598226
CAR PRICE PREDICTION
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
document is also known by the names SRS report, software document. A software
document is primarily prepared for a project, software or any kind of application.
There are a set of guidelines to be followed while preparing the software requirement
specification document. This includes the purpose, scope, functional and non-functional
requirements, software and hardware requirements of the project. In addition to this, it
also contains the information about environmental conditions required, safety and
security requirements, software quality attributes of the project etc.
The purpose of SRS (Software Requirement Specification) document is to describethe
external behavior of the application developed or software. It defines the operations,
performance and interfaces and quality assurance requirement of the application or
software. The complete software requirements for the system are captured by the SRS.
This section introduces the requirement specification document for Car Price Prediction
using linear Regression which enlists functional as well as non-functional requirements.
2.2 Functional Requirements
For documenting the functional requirements, the set of functionalities supported by
the system are to be specified. A function can be specified by identifying the state at
which data is to be input to the system, its input data domain, the output domain, and the
type of processing to be carried on the input data to obtain the output data. Functional
requirements define specific behavior or function of the application. Following are the
functional requirements:
FR1) After Registration the details should store in MySQL.
FR2) Entering Login details should show the user’s data .
FR3) The login page should redirect to next page(home).
FR4) The attributes should be shown after redirecting to home page.
FR5) After Entering attributes the price prediction should be shown.
2.3 Non-Functional Requirements
A non-functional requirement is a requirement that specifies criteria that can be used
to judge the operation of a system, rather than specific behaviors. Especially these are
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 17
lOMoAR cPSD|24598226
CAR PRICE PREDICTION
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
the constraints the system must work within. Following are the non-functional
requirements:
NFR 1) Must be able to work properly without bugs.
NFR 2) Should not be any lag showing the price
NFR 3) The database should access proper user data.
NFR 4) Attributes must be displayed properly to user.
2.3.1 Performance:
The performance of the developed applications can be calculated by using following
methods: Measuring enables you to identify how the performance of your application
stands in relation to your defined performance goals and helps you to identify the
bottlenecks that affect your application performance. It helps you identify whether your
application is moving toward or away from your performance goals. Defining what you
will measure, that is, your metrics, and defining the objectives for each metric is a
critical part of your testing plan.
Performance objectives include the following:
Response time, Latency throughput or Resource utilization.
2.4 Software Requirements
Operating System : Windows 10/11 or MAC OS.
Platform : Google colab, PyCharm IDE
Programming Language : Python, SQL
2.5 Hardware Requirements
Processor : Intel core i3 and above.
Hard Disk : 1 TB or above.
RAM : 4 GB or above.
Internet : 1 Mbps or above (Wireless).
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 18
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
What is SRS ?
The process of testing a software in a well-planned and systematic way is known
as software testing lifecycle (STLC). Different organizations have different
phases in STLC however generic Software Test Life Cycle (STLC) for waterfall
development model consists of the following phases:
1.Requirements Analysis
2.Test Planning
3.Test Analysis
4.Test Design
5.Test Construction and Verification
6.Test Execution and Bug Reporting
7.Final Testing and Implementation
8.Post Implementation
2.6 Requirements Analysis
In this phase testers analyses the customer requirements and work with developers
during the design phase to see which requirements are testable and how they are going to
test those requirements. It is very important to start testing activities from the
requirements phase itself because the cost of fixing defect is very less if it is found in
requirements phase rather than in future phases. In this phase all the planning about
testing is done like what needs to be tested, how the testing will be done, test strategy to
be followed, what will be the test environment, what test methodologies will be
followed, hardware and software availability, resources, risks etc. A high level test plan
document is created which includes all the planning inputs mentioned above and
circulated to the stakeholders.
2.7 Test Construction and Verification
In this phase testers prepare more test cases by keeping in mind the positive and
negative scenarios, end user scenarios etc. All the test cases and automation scripts need
to be completed in this phase and got reviewed by the stakeholders. The test plan
document should also be finalized and verified by reviewers.
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 19
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
2.8 Test Execution and Bug Reporting
Once the unit testing is done by the developers and test team gets the test build, The
test cases are executed and defects are reported in bug tracking tool, after the test
execution is complete and all the defects are reported. Test execution reports are created
and circulated to project stakeholders. After developers fix the bugs raised by testers
theygive another build with fixes to testers, testers do re-testing and regression testing to
ensure that the defect has been fixed and not affected any other areas of software.
Testing is an iterative process i.e. If defect is found and fixed, testing needs to be done
after every defect fix. After tester assures that defects have been fixed and no more
critical defects remain in software the build is given for final testing.
2.9Final Testing and Implementation
In this phase the final testing is done for the software, non-functional testing like
stress, load and performance testing are performed in this phase. The software is also
verified in the production kind of environment. Final test execution reports and
documents are prepared in this phase.
2.10 Post Implementation
In this phase the test environment is cleaned up and restored to default state, the
process review meetings are done and lessons learnt are documented. A document is
prepared to cope up similar problems in future releases.
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 20
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
Phase Activities Outcome
Planning
Create high level test
plan
Test plan, Refined
Specification
Analysis
Create detailed testplan,
Functional
Revised Test Plan,
Functional Validation
Matrix, test cases
Validation Matrix, test
cases
Design
Test cases are revised,
select which test cases
to automate
Revised test cases, test
data sets,
risk
assessment sheet.
Construction
Scripting of test cases
to automate
Test
procedures/Scripts,
Drivers, test
results,
Bug reports
Testing cycles
Complete testing
cycles
Test results, Bug
reports
Final testing
Execute remainingstress and
performancetests, complete
documentation
Test results and
different metrics on
test efforts
Post implementation
Evaluate testing
processes
Plan for improvement
of testing process
Table 3.7 – Activities and Outcomes of each phase in SDLC
2.11 Technologies Used:
2.11.1 Google Colab
Colaboratory, or ―Colab‖ for short, is a product from Google Research. Colab allows
anybody to write and execute arbitrary python code through the browser, and is especially
well suited to machine learning, data analysis and education. More technically, Colab is a
hosted Google colab service that requires no setup to use, while providing access free of
charge to computing resources including GPUs.
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 21
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
Is Google colab like Google colab?
Google Colab's major differentiator from Google colab is that it is cloud-based and
Jupyter is not. This means that if you work in Google Collab, you do not have to worry
about downloading and installing anything to your hardware.
Fig 3.8.1 – Google colab
2.11.2 PyCharm IDE
PyCharm is a dedicated Python Integrated Development Environment (IDE)
providing a wide range of essential tools for Python developers, tightly integrated to
create a convenient environment for productive Python, web, and data science
development.
JetBrains s.r.o. (formerly IntelliJ Software s.r.o.) is a Czech software development
company which makes tools for software developers and project managers. The company
offers integrated development environments (IDEs) for the programming languages Java,
Groovy, Kotlin, Ruby, Python, PHP, C, Objective-C, C++, C#, F#, Go, JavaScript, and
the domain-specific language SQL.
2.11.3 SQL
SQL (Structured Query Language) is a powerful and standard query language for
relational database systems. We use SQL to perform CRUD (Create, Read, Update,
Delete) operations on databases along with other various operations. SQL has evolved a
lot in the past decade.
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 22
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
utilities, or as parts of other applications.
RDBMS
RDBMS stands for Relational Database Management System. RDBMS is the basis for
SQL, and for all modern database systems such as MS SQL Server, IBM DB2, Oracle,
MySQL, and Microsoft Access. The data in RDBMS is stored in database objects called
tables. A table is a collection of related data entries and it consists of columns and rows.
Although SQL is an ANSI/ISO standard, there are different versions of the SQL
language. However, to be compliant with the ANSI standard, they all support at least the
major commands (such as SELECT, UPDATE, DELETE, INSERT, WHERE) in a
similar manner.
MySQL, the most popular Open Source SQL database management system, is
developed, distributed, and supported by Oracle Corporation.
MySQL is a database management system.
A database is a structured collection of data. It may be anything from a simple shopping
list to a picture gallery or the vast amounts of information in a corporate network. To add,
access, and process data stored in a computer database, you need a database management
system such as MySQL Server. Since computers are very good at handling large amounts
of data, database management systems play a central role in computing, as standalone
Using SQL in Your Web Site
To build a web site that shows data from a database, you will need:
 An RDBMS database program (i.e. MS Access, SQL Server, MySQL)
 To use a server-side scripting language, like PHP or python
 To use SQL to get the data you want
 To use HTML / CSS to style the page
2.11.4 Flask
Flask is a micro web framework written in Python. It is classified as a micro
framework because it does not require particular tools or libraries. It has no database
abstraction layer, form validation, or any other components where pre-existing third-party
libraries provide common functions.
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 23
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 24
CHAPTER -3
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
3. LITERATURE SURVEY
3.1 Paper work
Over fitting and under fitting come into picture when we create our statistical
models. The models might be too biased to the training data and might not perform well
on the test dataset. This is called over fitting. Likewise, the models might not take into
consideration all the variance present in the population and perform poorly on a test data
set. This is called underfitting. A perfect balance needs to be achieved between these two,
which leads to the concept of Bias-Variance tradeoff. Pierre Geurts has introduced and
explained how bias-variance tradeoff is achieved in both regression and classification.
The selection of variables/attribute plays a vital role in influencing both the bias and
variance of the statistical model. Robert Tibshirani proposed a new method called Lasso,
which minimizes the residual sum of squares. This returns a subset of attributes which
need to be included in multiple regression to get the minimal error rate. Similarly,
decision trees suffer from overfitting if they are not pruned/shrunk. Trevor Hastie and
Daryl Pregibon have explained the concept of pruning in their research paper. Moreover,
hypothesis testing using ANOVA is needed to verify whether the different groups of
errors really differ from each other. This is explained by TK Kim and Tae Kyun in their
paper. A Post-Hoc test needs to be performed along with ANOVA if the number of
groups exceeds two.
Turkey’s Test has been explored by Haynes W. in his research paper. Using these
techniques, we will create, train and test the effectiveness of our statistical models.
The paper is Predicting the price of Used Car Using Machine Learning Techniques. In
this paper, they investigate the application of supervised machine learning techniques to predict
the price of used cars in Mauritius. The predictions are based on historical data collected from
daily newspapers. Different techniques like multiple linear regression analysis, k-nearest
neighbors, naïve bayes and decision trees have been used to make the predictions.
The paper is Car Price Prediction Using Machine Learning Techniques. Considerable
number of distinct attributes is examined for the reliable and accurate prediction. To build
a model for predicting the price of used cars in Bosnia and Herzegovina, they have
applied three machine learning techniques (Artificial Neural Network, Support Vector
Machine and Random Forest).
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 25
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
The paper is Price Evaluation model in second hand car system based on BP neural
networks. In this paper, the price evaluation model based on big data analysis is
proposed, which takes advantage of widely circulated vehicle data and a large number of
vehicle transaction data to analyze the price data for each type of vehicles by using the
optimized BP neural network algorithm. It aims to established second-hand car price
evaluation model to get the price that best matches the car.
3.2 PROPOSED MODEL
Null Hypothesis
Even though the magnitude of over fitting has been reduced, Regression trees still suffer
from over fitting even after Pruning. This leads to our following hypothesis.
Hypothesis: Multiple and Lasso Regressions are better at predicting price than the
Regression Tree.
Training and Testing Data
The data is split into training (70% - 563 records) and testing (30% - 241 records) data
sets through random sampling (seed was set to 2786).
Linear Regression
In statistics, linear regression is a linear approach for modelling the relationship between
a scalar response and one or more explanatory variables (also known as dependent and
independent variables). The case of one explanatory variable is called simple linear
regression; for more than one, the process is called multiple linear regression. This term
is distinct from multivariate linear regression, where multiple correlated dependent
variables are predicted, rather than a single scalar variable.
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 26
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
3.3 Related work
Researchers more often predict prices of products using some previous data and
so did Pudaruth who predicted prices of cars in Mauritius and these cars were new rather
second hand. He used multiple linear regression, k-nearest neighbors, naïve Bayes and
decision trees algorithm in order to predict the prices. The comparison of prediction
results from these techniques showed that the prices from these methods are closely
comparable. However, it was found that decision tree algorithm and naïve bayes method
were unable to classify and predict numeric values. Pudaruth’s research also concluded
that limited number of instances in data set do not offer high prediction accuracies.
Multivariate regression model helps in classifying and predicting values of numeric
format. Kuiper used this model to predict price of 2005 General Motor (GM) cars. The
price prediction of cars does not require any special knowledge so the data available
online is enough to predict prices like the data available on www.pakwheels.com. Kuiper
did the same i.e. car price prediction and introduced variable selection techniques which
helped in finding which variables are more relevant for inclusion in model. He
encouraged students to use different models and find how checking model assumptions
work. Another similar research by Listiani uses Support Vector Machines (SVM) to
predict the prices of leased cars. This research showed that SVM is far more accurate in
predicting prices as compared to the multiple linear regression when a very large dataset
is available. SVM also handles high dimensional data better and avoids both the under-
fitting and over-fitting issues. Genetic algorithm is used by Listiani to find important
features for SVM. However, the technique does not show in terms of variance and mean
standard deviation why SVM is better than simple multiple regression.
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 27
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
CHAPTER -4
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 28
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
4. SYSTEM DESIGN
4.1 Introduction to UML
The Unified Modeling Language allows the software engineer to express an analysis
model using the modeling notation that is governed by a set of syntactic, semantic and
pragmatic rules. A UML system is represented using five different views that describe
the system from distinctly different perspective. Each view is defined by a set of
diagram, which is as follows:
1. User Model View
This view represents the system from the users’ perspective. The analysis
representation describes a usage scenario from the end-users’ perspective.
2. Structural Model View
In this model, the data and functionality are arrived from inside the system. This
model view models the static structures.
3. Behavioral Model View
It represents the dynamic of behavioral as parts of the system, depicting he
interactions of collection between various structural elements described in the
user model and structural model view.
4. Implementation Model View
In this view, the structural and behavioral as parts of the system are represented
as they are to be built.
5. Environmental Model View
In this view, the structural and behavioral aspects of the environment in which
the system is to be implemented are represented.
4.2 UML Diagrams
4.2.1 Use Case Diagram
To model a system, the most important aspect is to capture the dynamic behavior. To
clarify a bit in details, dynamic behavior means the behavior of the system when it is
running/operating.
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 29
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
So only static behavior is not sufficient to model a system rather dynamic
behavior is more important than static behavior. In UML there are five diagrams
available to modeldynamic nature and use case diagram is one of them. Now as
we have to discuss that the use case diagram is dynamic in nature there should
be some internal or external factors for making the interaction.
These internal and external agents are known as actors. So use case diagrams are
consisting of actors, use cases and their relationships. The diagram is used to
model the system/subsystem of an application. A single use case diagram
captures a particular functionality of a system. So to model the entire system
numbers of use case diagramsare used.
Use case diagrams are used to gather the requirements of a system including
internal and external influences. These requirements are mostly design
requirements. So when a system is analysed to gather its functionalities use
cases are prepared and actors are identified. In brief, the purposes of use case
diagrams can be as follows:
a. Used to gather requirements of a system.
b. Used to get an outside view of a system.
c. Identify external and internal factors influencing the system.
d. Show the interacting among the requirements are actors.
Fig 4.2.1 – Use Case Diagram
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 30
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
4.2.2 Sequence Diagram
Sequence diagrams describe interactions among classes in terms of an exchange
of messages over time. They're also called event diagrams. A sequence diagram is a good
way to visualize and validate various runtime scenarios. These can help to predict how a
system will behave and to discover responsibilities a class may need to have in the
process of modelling a new system.
The aim of a sequence diagram is to define event sequences, which would have a desired
outcome. The focus is more on the order in which messages occur than on the message
per se. However, the majority of sequence diagrams will communicate what messages
are sent and the order in which they tend to occur.
Basic Sequence Diagram NotationsClass Roles or Participants
Class roles describe the way an object will behave in context. Use the UML object
symbol to illustrate class roles, but don't list object attributes.
Activation or Execution Occurrence
Activation boxes represent the time an object needs to complete a task. When an object
is busy executing a process or waiting for a reply message, use a thin grey rectangle
placed vertically on its lifeline.
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 31
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
Fig 4.2.2 – Sequence Diagram
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 32
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
Each class is represented by a rectangle having a subdivision of three compartments
name, attributes and operation.
4.2.3 Class Diagram
Class diagrams are the main building blocks of every object oriented methods. The
class diagram can be used to show the classes, relationships, interface, association, and
collaboration. UML is standardized in class diagrams. Since classes are the building
block of an application that is based on OOPs, so as the class diagram has appropriate
structure to represent the classes, inheritance, relationships, and everything that OOPs
have in its context. It describes various kinds of objects and the static relationship in
between them.
The main purpose to use class diagrams are:
1. This is the only UML which can appropriately depict various aspects of
OOPsconcept.
2. Proper design and analysis of application can be faster and efficient.
3. It is base for deployment and component diagram.
Figure 4.2.3 Class Diagram
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 33
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
4.2.4 System Design
A software module is the lowest level of design granularity in the system.
Depending on the software development approach, there may be one or more modules per
system. This section should provide enough detailed information about logic and data
necessary to completely write source code for all modules in the system (and/or integrate
COTS software programs).
If there are many modules or if the module documentation is extensive, place it in an
appendix or reference a separate document. Add additional diagrams and information, if
necessary, to describe each module, its functionality, and its hierarchy. Industry-standard
module specification practices should be followed. Include the following information in
the detailed module designs:
 A narrative description of each module, its function(s), the conditions under which
it is used (called or scheduled for execution), its overall processing, logic,
interfaces to other modules, interfaces to external systems, security requirements,
etc.; explain any algorithms used by the module in detail
 For COTS packages, specify any call routines or bridging programs to integrate the
package with the system and/or other COTS packages (for example, Dynamic Link
Libraries)
 Data elements, record structures, and file structures associated with module input
and output
 Graphical representation of the module processing, logic, flow of control, and
algorithms, using an accepted diagramming approach (for example, structure
charts, action diagrams, flowcharts, etc.)
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 34
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
 Data entry and data output graphics; define or reference associated data elements;
if the project is large and complex or if the detailed module designs will be
incorporated into a separate document, then it may be appropriate to repeat the
screen information in this section
 Report layout
Figure 4.2.4 System Design
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 35
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
4.2.5 State Chart Diagram
The name of the diagram itself clarifies the purpose of the diagram and other
details. It describes different states of a component in a system. The states are specific to
a component/object of a system.
A Statechart diagram describes a state machine. State machine can be defined as a
machine which defines different states of an object and these states are controlled by
external or internal events.
Activity diagram explained in the next chapter, is a special kind of a Statechart diagram.
As Statechart diagram defines the states, it is used to model the lifetime of an object.
4.2.5.1 How to Draw a Statechart Diagram?
Statechart diagram is used to describe the states of different objects in its life
cycle. Emphasis is placed on the state changes upon some internal or external events.
These states of objects are important to analyze and implement them accurately.
Statechart diagrams are very important for describing the states. States can be identified
as the condition of objects when a particular event occurs.
Before drawing a Statechart diagram we should clarify the following points −
 Identify the important objects to be analyzed.
 Identify the states.
 Identify the events.
Following is an example of a Statechart diagram where the state of Order object is
analyzed
The first state is an idle state from where the process starts. The next states are arrived for
events like send request, confirm request, and dispatch order. These events are
responsible for the state changes of order object.
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 36
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
During the life cycle of an object (here order object) it goes through the following states
and there may be some abnormal exits. This abnormal exit may occur due to some
problem in the system. When the entire life cycle is complete, it is considered as a
complete transaction as shown in the following figure. The initial and final state of an
object is also shown in the following figure.
Figure 4.2.5 Sate Chart Diagram
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 37
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 38
CHAPTER -5
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
5. IMPLEMENTATION
5.1 Pseudo Code
Step 1: Import the required packages.
Step 2: Download the dataset and link it to the google colab.
Step 3: Read the dataset and perform operations on data.
Step 4: Data cleaning.
Step 5: Data Preprocessing.
Step 6: Saving the cleaned car data set after performing operations on data.
Step 7: Start training the Machine learning Model.
Step 8: Split features and target as x and y respectively.
Step 9: Split the new data into 80% of Training data and 20% of Testing data.
Step 10: Train the model with Training data and Testing data.
Step 11: Implementing one hot encoder and column transformer to model.
Step 12: Applying Linear Regression to the model.
Step 13: Fit the Linear Regression Model.
Step 14: If accuracy is good use the model for prediction else fit the model again,
using other random states.
Step 15: Dump the Linear Regression model into our files using pickle .
Step 16: Open Pycharm and extract the cleaned car.csv and LinearRegressionModel.pkl
files into our project.
Step 17: Reading the model and dataset, make the prediction using python
and flask from webpage.
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 39
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
5.2 Google Collab Data set Implementation:
import pandas as pd
car=pd.read_csv("https://raw.githubusercontent.com/rajtilakls2510/car_price_predictor/m
aster/quikr_car.csv")
car.shape
(892, 6)
car.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 892 entries, 0 to 891
Data columns (total 6 columns):
# Column Non-Null Count Dtype
0 name 892 non-null object
1 company 892 non-null object
2 year 892 non-null object
3 Price 892 non-null object
4 kms_driven 840 non-null object
5 fuel_type 837 non-null object
dtypes: object(6)
memory usage: 41.9+ KB
car['year'].unique()
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 40
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
array(['2007', '2006', '2018', '2014', '2015', '2012', '2013', '2016',
'2010', '2017', '2008', '2011', '2019', '2009', '2005', '2000',
'...', '150k', 'TOUR', '2003', 'r 15', '2004', 'Zest', '/-Rs',
'sale', '1995', 'ara)', '2002', 'SELL', '2001', 'tion', 'odel',
'2 bs', 'arry', 'Eon', 'o...', 'ture', 'emi', 'car', 'able', 'no.',
'd...', 'SALE', 'digo', 'sell', 'd Ex', 'n...', 'e...', 'D...',
', Ac', 'go .', 'k...', 'o c4', 'zire', 'cent', 'Sumo', 'cab',
't xe', 'EV2', 'r...', 'zest'], dtype=object)
car['Price'].unique()
array(['80,000', '4,25,000', 'Ask For Price', '3,25,000', '5,75,000',
'1,75,000', '1,90,000', '8,30,000', '2,50,000', '1,82,000',
'3,15,000', '4,15,000', '3,20,000', '10,00,000', '5,00,000',
'3,50,000', '1,60,000', '3,10,000', '75,000', '1,00,000',
'2,90,000', '95,000', '1,80,000', '3,85,000', '1,05,000',
'6,50,000', '6,89,999', '4,48,000', '5,49,000', '5,01,000',
'4,89,999', '2,80,000', '3,49,999', '2,84,999', '3,45,000',
'4,99,999', '2,35,000', '2,49,999', '14,75,000', '3,95,000',
'2,20,000', '1,70,000', '85,000', '2,00,000', '5,70,000',
'1,10,000', '4,48,999', '18,91,111', '1,59,500', '3,44,999',
'4,49,999', '8,65,000', '6,99,000', '3,75,000', '2,24,999',
'12,00,000', '1,95,000', '3,51,000', '2,40,000', '90,000',
'1,55,000', '6,00,000', '1,89,500', '2,10,000', '3,90,000',
'1,35,000', '16,00,000', '7,01,000', '2,65,000', '5,25,000',
'3,72,000', '6,35,000', '5,50,000', '4,85,000', '3,29,500',
'2,51,111', '5,69,999', '69,999', '2,99,999', '3,99,999',
'4,50,000', '2,70,000', '1,58,400', '1,79,000', '1,25,000',
'2,99,000', '1,50,000', '2,75,000', '2,85,000', '3,40,000',
'70,000', '2,89,999', '8,49,999', '7,49,999', '2,74,999',
'9,84,999', '5,99,999', '2,44,999', '4,74,999', '2,45,000',
'1,69,500', '3,70,000', '1,68,000', '1,45,000', '98,500',
'2,09,000', '1,85,000', '9,00,000', '6,99,999', '1,99,999',
'5,44,999', '1,99,000', '5,40,000', '49,000', '7,00,000', '55,000',
'8,95,000', '3,55,000', '5,65,000', '3,65,000', '40,000',
'4,00,000', '3,30,000', '5,80,000', '3,79,000', '2,19,000',
'5,19,000', '7,30,000', '20,00,000', '21,00,000', '14,00,000',
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 41
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
'3,11,000', '8,55,000', '5,35,000', '1,78,000', '3,00,000',
'2,55,000', '5,49,999', '3,80,000', '57,000', '4,10,000',
'2,25,000', '1,20,000', '59,000', '5,99,000', '6,75,000', '72,500',
'6,10,000', '2,30,000', '5,20,000', '5,24,999', '4,24,999',
'6,44,999', '5,84,999', '7,99,999', '4,44,999', '6,49,999',
'9,44,999', '5,74,999', '3,74,999', '1,30,000', '4,01,000',
'13,50,000', '1,74,999', '2,39,999', '99,999', '3,24,999',
'10,74,999', '11,30,000', '1,49,000', '7,70,000', '30,000',
'3,35,000', '3,99,000', '65,000', '1,69,999', '1,65,000',
'5,60,000', '9,50,000', '7,15,000', '45,000', '9,40,000',
'1,55,555', '15,00,000', '4,95,000', '8,00,000', '12,99,000',
'5,30,000', '14,99,000', '32,000', '4,05,000', '7,60,000',
'7,50,000', '4,19,000', '1,40,000', '15,40,000', '1,23,000',
'4,98,000', '4,80,000', '4,88,000', '15,25,000', '5,48,900',
'7,25,000', '99,000', '52,000', '28,00,000', '4,99,000',
'3,81,000', '2,78,000', '6,90,000', '2,60,000', '90,001',
'1,15,000', '15,99,000', '1,59,000', '51,999', '2,15,000',
'35,000', '11,50,000', '2,69,000', '60,000', '4,30,000',
'85,00,003', '4,01,919', '4,90,000', '4,24,000', '2,05,000',
'5,49,900', '3,71,500', '4,35,000', '1,89,700', '3,89,700',
'3,60,000', '2,95,000', '1,14,990', '10,65,000', '4,70,000',
'48,000', '1,88,000', '4,65,000', '1,79,999', '21,90,000',
'23,90,000', '10,75,000', '4,75,000', '10,25,000', '6,15,000',
'19,00,000', '14,90,000', '15,10,000', '18,50,000', '7,90,000',
'17,25,000', '12,25,000', '68,000', '9,70,000', '31,00,000',
'8,99,000', '88,000', '53,000', '5,68,500', '71,000', '5,90,000',
'7,95,000', '42,000', '1,89,000', '1,62,000', '35,999',
'29,00,000', '39,999', '50,500', '5,10,000', '8,60,000',
'5,00,001'], dtype=object)
car['kms_driven'].unique()
array(['45,000 kms', '40 kms', '22,000 kms', '28,000 kms', '36,000 kms',
'59,000 kms', '41,000 kms', '25,000 kms', '24,530 kms',
'60,000 kms', '30,000 kms', '32,000 kms', '48,660 kms',
'4,000 kms', '16,934 kms', '43,000 kms', '35,550 kms',
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 42
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
'39,522 kms', '39,000 kms', '55,000 kms', '72,000 kms',
'15,975 kms', '70,000 kms', '23,452 kms', '35,522 kms',
'48,508 kms', '15,487 kms', '82,000 kms', '20,000 kms',
'68,000 kms', '38,000 kms', '27,000 kms', '33,000 kms',
'46,000 kms', '16,000 kms', '47,000 kms', '35,000 kms',
'30,874 kms', '15,000 kms', '29,685 kms', '1,30,000 kms',
'19,000 kms', nan, '54,000 kms', '13,000 kms', '38,200 kms',
'50,000 kms', '13,500 kms', '3,600 kms', '45,863 kms',
'60,500 kms', '12,500 kms', '18,000 kms', '13,349 kms',
'29,000 kms', '44,000 kms', '42,000 kms', '14,000 kms',
'49,000 kms', '36,200 kms', '51,000 kms', '1,04,000 kms',
'33,333 kms', '33,600 kms', '5,600 kms', '7,500 kms', '26,000 kms',
'24,330 kms', '65,480 kms', '28,028 kms', '2,00,000 kms',
'99,000 kms', '2,800 kms', '21,000 kms', '11,000 kms',
'66,000 kms', '3,000 kms', '7,000 kms', '38,500 kms', '37,200 kms',
'43,200 kms', '24,800 kms', '45,872 kms', '40,000 kms',
'11,400 kms', '97,200 kms', '52,000 kms', '31,000 kms',
'1,75,430 kms', '37,000 kms', '65,000 kms', '3,350 kms',
'75,000 kms', '62,000 kms', '73,000 kms', '2,200 kms',
'54,870 kms', '34,580 kms', '97,000 kms', '60 kms', '80,200 kms',
'3,200 kms', '0,000 kms', '5,000 kms', '588 kms', '71,200 kms',
'1,75,400 kms', '9,300 kms', '56,758 kms', '10,000 kms',
'56,450 kms', '56,000 kms', '32,700 kms', '9,000 kms', '73 kms',
'1,60,000 kms', '84,000 kms', '58,559 kms', '57,000 kms',
'1,70,000 kms', '80,000 kms', '6,821 kms', '23,000 kms',
'34,000 kms', '1,800 kms', '4,00,000 kms', '48,000 kms',
'90,000 kms', '12,000 kms', '69,900 kms', '1,66,000 kms',
'122 kms', '0 kms', '24,000 kms', '36,469 kms', '7,800 kms',
'24,695 kms', '15,141 kms', '59,910 kms', '1,00,000 kms',
'4,500 kms', '1,29,000 kms', '300 kms', '1,31,000 kms',
'1,11,111 kms', '59,466 kms', '25,500 kms', '44,005 kms',
'2,110 kms', '43,222 kms', '1,00,200 kms', '65 kms',
'1,40,000 kms', '1,03,553 kms', '58,000 kms', '1,20,000 kms',
'49,800 kms', '100 kms', '81,876 kms', '6,020 kms', '55,700 kms',
'18,500 kms', '1,80,000 kms', '53,000 kms', '35,500 kms',
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 43
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
'22,134 kms', '1,000 kms', '8,500 kms', '87,000 kms', '6,000 kms',
'15,574 kms', '8,000 kms', '55,800 kms', '56,400 kms',
'72,160 kms', '11,500 kms', '1,33,000 kms', '2,000 kms',
'88,000 kms', '65,422 kms', '1,17,000 kms', '1,50,000 kms',
'10,750 kms', '6,800 kms', '5 kms', '9,800 kms', '57,923 kms',
'30,201 kms', '6,200 kms', '37,518 kms', '24,652 kms', '383 kms',
'95,000 kms', '3,528 kms', '52,500 kms', '47,900 kms',
'52,800 kms', '1,95,000 kms', '48,008 kms', '48,247 kms',
'9,400 kms', '64,000 kms', '2,137 kms', '10,544 kms', '49,500 kms',
'1,47,000 kms', '90,001 kms', '48,006 kms', '74,000 kms',
'85,000 kms', '29,500 kms', '39,700 kms', '67,000 kms',
'19,336 kms', '60,105 kms', '45,933 kms', '1,02,563 kms',
'28,600 kms', '41,800 kms', '1,16,000 kms', '42,590 kms',
'7,400 kms', '54,500 kms', '76,000 kms', '00 kms', '11,523 kms',
'38,600 kms', '95,500 kms', '37,458 kms', '85,960 kms',
'12,516 kms', '30,600 kms', '2,550 kms', '62,500 kms',
'69,000 kms', '28,400 kms', '68,485 kms', '3,500 kms',
'85,455 kms', '63,000 kms', '1,600 kms', '77,000 kms',
'26,500 kms', '2,875 kms', '13,900 kms', '1,500 kms', '2,450 kms',
'1,625 kms', '33,400 kms', '60,123 kms', '38,900 kms',
'1,37,495 kms', '91,200 kms', '1,46,000 kms', '1,00,800 kms',
'2,100 kms', '2,500 kms', '1,32,000 kms', 'Petrol'], dtype=object)
car['fuel_type'].unique()
array(['Petrol', 'Diesel', nan, 'LPG'], dtype=object)
backup=car.copy()
car=car[car['year'].str.isnumeric()]
car['year']=car['year'].astype(int)
car.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 842 entries, 0 to 891
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 44
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
Data columns (total 6 columns):
# Column Non-Null Count Dtype
0 name 842 non-null object
1 company 842 non-null object
2 year 842 non-null int32
3 Price 842 non-null object
4 kms_driven 840 non-null object
5 fuel_type 837 non-null object
dtypes: int32(1), object(5)
memory usage: 42.8+ KB
car=car[car['Price'] != "Ask For Price"]
car['Price']=car['Price'].str.replace(',','').astype(int)
car['kms_driven']=car['kms_driven'].str.split(' ').str.get(0).str.replace(',','')
car=car[car['kms_driven'].str.isnumeric()]
car['kms_driven']=car['kms_driven'].astype(int)
car=car[~car['fuel_type'].isna()]
car['name']=car['name'].str.split(' ').str.slice(0,3).str.join(' ')
car=car.reset_index(drop=True)
car=car[car['Price']<6e6].reset_index(drop=True)
car.to_csv('cleaned car.csv')
#Splitting the features and target
x=car.drop(columns='Price')
y=car['Price']
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 45
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
x
name company year kms_driven fuel_type
0 Hyundai Santro Xing Hyundai 2007 45000 Petrol
1 Mahindra Jeep CL550 Mahindra 2006 40 Diesel
2 Hyundai Grand i10 Hyundai 2014 28000 Petrol
3 Ford EcoSport Titanium Ford 2014 36000 Diesel
4 Ford Figo Ford 2012 41000 Diesel
... ... ... ... ... ...
811 Maruti Suzuki Ritz Maruti 2011 50000 Petrol
812 Tata Indica V2Tata 2009 30000 Diesel
813 Toyota Corolla Altis Toyota 2009 132000 Petrol
814 Tata Zest XM Tata 2018 27000 Diesel
815 Mahindra Quanto C8 Mahindra 2013 40000 Diesel
816 rows × 5 columns
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2)
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_transformer
from sklearn.pipeline import make_pipeline
ohe=OneHotEncoder()
ohe.fit(x[['name','company','fuel_type']])
OneHotEncoder()
ohe.categories_
[array(['Audi A3 Cabriolet', 'Audi A4 1.8', 'Audi A4 2.0', 'Audi A6 2.0',
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 46
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
'Audi A8', 'Audi Q3 2.0', 'Audi Q5 2.0', 'Audi Q7', 'BMW 3 Series',
'BMW 5 Series', 'BMW 7 Series', 'BMW X1', 'BMW X1 sDrive20d',
'BMW X1 xDrive20d', 'Chevrolet Beat', 'Chevrolet Beat Diesel',
'Chevrolet Beat LS', 'Chevrolet Beat LT', 'Chevrolet Beat PS',
'Chevrolet Cruze LTZ', 'Chevrolet Enjoy', 'Chevrolet Enjoy 1.4',
'Chevrolet Sail 1.2', 'Chevrolet Sail UVA', 'Chevrolet Spark',
'Chevrolet Spark 1.0', 'Chevrolet Spark LS', 'Chevrolet Spark LT',
'Chevrolet Tavera LS', 'Chevrolet Tavera Neo', 'Datsun GO T',
'Datsun Go Plus', 'Datsun Redi GO', 'Fiat Linea Emotion',
'Fiat Petra ELX', 'Fiat Punto Emotion', 'Force Motors Force',
'Force Motors One', 'Ford EcoSport', 'Ford EcoSport Ambiente',
'Ford EcoSport Titanium', 'Ford EcoSport Trend',
'Ford Endeavor 4x4', 'Ford Fiesta', 'Ford Fiesta SXi', 'Ford Figo',
'Ford Figo Diesel', 'Ford Figo Duratorq', 'Ford Figo Petrol',
'Ford Fusion 1.4', 'Ford Ikon 1.3', 'Ford Ikon 1.6',
'Hindustan Motors Ambassador', 'Honda Accord', 'Honda Amaze',
'Honda Amaze 1.2', 'Honda Amaze 1.5', 'Honda Brio', 'Honda Brio V',
'Honda Brio VX', 'Honda City', 'Honda City 1.5', 'Honda City SV',
'Honda City VX', 'Honda City ZX', 'Honda Jazz S', 'Honda Jazz VX',
'Honda Mobilio', 'Honda Mobilio S', 'Honda WR V', 'Hyundai Accent',
'Hyundai Accent Executive', 'Hyundai Accent GLE',
'Hyundai Accent GLX', 'Hyundai Creta', 'Hyundai Creta 1.6',
'Hyundai Elantra 1.8', 'Hyundai Elantra SX', 'Hyundai Elite i20',
'Hyundai Eon', 'Hyundai Eon D', 'Hyundai Eon Era',
'Hyundai Eon Magna', 'Hyundai Eon Sportz', 'Hyundai Fluidic Verna',
'Hyundai Getz', 'Hyundai Getz GLE', 'Hyundai Getz Prime',
'Hyundai Grand i10', 'Hyundai Santro', 'Hyundai Santro AE',
'Hyundai Santro Xing', 'Hyundai Sonata Transform', 'Hyundai Verna',
'Hyundai Verna 1.4', 'Hyundai Verna 1.6', 'Hyundai Verna Fluidic',
'Hyundai Verna Transform', 'Hyundai Verna VGT',
'Hyundai Xcent Base', 'Hyundai Xcent SX', 'Hyundai i10',
'Hyundai i10 Era', 'Hyundai i10 Magna', 'Hyundai i10 Sportz',
'Hyundai i20', 'Hyundai i20 Active', 'Hyundai i20 Asta',
'Hyundai i20 Magna', 'Hyundai i20 Select', 'Hyundai i20 Sportz',
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 47
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
'Jaguar XE XE', 'Jaguar XF 2.2', 'Jeep Wrangler Unlimited',
'Land Rover Freelander', 'Mahindra Bolero DI',
'Mahindra Bolero Power', 'Mahindra Bolero SLE',
'Mahindra Jeep CL550', 'Mahindra Jeep MM', 'Mahindra KUV100',
'Mahindra KUV100 K8', 'Mahindra Logan', 'Mahindra Logan Diesel',
'Mahindra Quanto C4', 'Mahindra Quanto C8', 'Mahindra Scorpio',
'Mahindra Scorpio 2.6', 'Mahindra Scorpio LX',
'Mahindra Scorpio S10', 'Mahindra Scorpio S4',
'Mahindra Scorpio SLE', 'Mahindra Scorpio SLX',
'Mahindra Scorpio VLX', 'Mahindra Scorpio Vlx',
'Mahindra Scorpio W', 'Mahindra TUV300 T4', 'Mahindra TUV300 T8',
'Mahindra Thar CRDe', 'Mahindra XUV500', 'Mahindra XUV500 W10',
'Mahindra XUV500 W6', 'Mahindra XUV500 W8', 'Mahindra Xylo D2',
'Mahindra Xylo E4', 'Mahindra Xylo E8', 'Maruti Suzuki 800',
'Maruti Suzuki A', 'Maruti Suzuki Alto', 'Maruti Suzuki Baleno',
'Maruti Suzuki Celerio', 'Maruti Suzuki Ciaz',
'Maruti Suzuki Dzire', 'Maruti Suzuki Eeco',
'Maruti Suzuki Ertiga', 'Maruti Suzuki Esteem',
'Maruti Suzuki Estilo', 'Maruti Suzuki Maruti',
'Maruti Suzuki Omni', 'Maruti Suzuki Ritz', 'Maruti Suzuki S',
'Maruti Suzuki SX4', 'Maruti Suzuki Stingray',
'Maruti Suzuki Swift', 'Maruti Suzuki Versa',
'Maruti Suzuki Vitara', 'Maruti Suzuki Wagon', 'Maruti Suzuki Zen',
'Mercedes Benz A', 'Mercedes Benz B', 'Mercedes Benz C',
'Mercedes Benz GLA', 'Mini Cooper S', 'Mitsubishi Lancer 1.8',
'Mitsubishi Pajero Sport', 'Nissan Micra XL', 'Nissan Micra XV',
'Nissan Sunny', 'Nissan Sunny XL', 'Nissan Terrano XL',
'Nissan X Trail', 'Renault Duster', 'Renault Duster 110',
'Renault Duster 110PS', 'Renault Duster 85', 'Renault Duster 85PS',
'Renault Duster RxL', 'Renault Kwid', 'Renault Kwid 1.0',
'Renault Kwid RXT', 'Renault Lodgy 85', 'Renault Scala RxL',
'Skoda Fabia', 'Skoda Fabia 1.2L', 'Skoda Fabia Classic',
'Skoda Laura', 'Skoda Octavia Classic', 'Skoda Rapid Elegance',
'Skoda Superb 1.8', 'Skoda Yeti Ambition', 'Tata Aria Pleasure',
'Tata Bolt XM', 'Tata Indica', 'Tata Indica V2', 'Tata Indica eV2',
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 48
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
'Tata Indigo CS', 'Tata Indigo LS', 'Tata Indigo LX',
'Tata Indigo Marina', 'Tata Indigo eCS', 'Tata Manza',
'Tata Manza Aqua', 'Tata Manza Aura', 'Tata Manza ELAN',
'Tata Nano', 'Tata Nano Cx', 'Tata Nano GenX', 'Tata Nano LX',
'Tata Nano Lx', 'Tata Sumo Gold', 'Tata Sumo Grande',
'Tata Sumo Victa', 'Tata Tiago Revotorq', 'Tata Tiago Revotron',
'Tata Tigor Revotron', 'Tata Venture EX', 'Tata Vista Quadrajet',
'Tata Zest Quadrajet', 'Tata Zest XE', 'Tata Zest XM',
'Toyota Corolla', 'Toyota Corolla Altis', 'Toyota Corolla H2',
'Toyota Etios', 'Toyota Etios G', 'Toyota Etios GD',
'Toyota Etios Liva', 'Toyota Fortuner', 'Toyota Fortuner 3.0',
'Toyota Innova 2.0', 'Toyota Innova 2.5', 'Toyota Qualis',
'Volkswagen Jetta Comfortline', 'Volkswagen Jetta Highline',
'Volkswagen Passat Diesel', 'Volkswagen Polo',
'Volkswagen Polo Comfortline', 'Volkswagen Polo Highline',
'Volkswagen Polo Highline1.2L', 'Volkswagen Polo Trendline',
'Volkswagen Vento Comfortline', 'Volkswagen Vento Highline',
'Volkswagen Vento Konekt', 'Volvo S80 Summum'], dtype=object),
array(['Audi', 'BMW', 'Chevrolet', 'Datsun', 'Fiat', 'Force', 'Ford',
'Hindustan', 'Honda', 'Hyundai', 'Jaguar', 'Jeep', 'Land',
'Mahindra', 'Maruti', 'Mercedes', 'Mini', 'Mitsubishi', 'Nissan',
'Renault', 'Skoda', 'Tata', 'Toyota', 'Volkswagen', 'Volvo'],
dtype=object),
array(['Diesel', 'LPG', 'Petrol'], dtype=object)]
column_trans=make_column_transformer((OneHotEncoder(categories=ohe.categories_),
['name','company','fuel_type']),
remainder='passthrough')
lr=LinearRegression()
pipe=make_pipeline(column_trans,lr)
pipe.fit(x_train,y_train)
Pipeline(steps=[('columntransformer',
ColumnTransformer(remainder='passthrough',
transformers=[('onehotencoder',
OneHotEncoder(categories=[array(['Audi A3 Cabriolet', 'Audi A4 1.8', 'Audi A4 2.0',
'Audi A6 2.0',
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 49
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
'Audi A8', 'Audi Q3 2.0', 'Audi Q5 2.0', 'Audi Q7', 'BMW 3 Series',
'BMW 5 Series', 'BMW 7 Series', 'BMW X1', 'BMW X1 sDrive20d',
'BMW X1 xDrive20d', 'Chevrolet Beat', 'Chevrolet Beat...
array(['Audi', 'BMW', 'Chevrolet', 'Datsun', 'Fiat', 'Force', 'Ford',
'Hindustan', 'Honda', 'Hyundai', 'Jaguar', 'Jeep', 'Land',
'Mahindra', 'Maruti', 'Mercedes', 'Mini', 'Mitsubishi', 'Nissan',
'Renault', 'Skoda', 'Tata', 'Toyota', 'Volkswagen', 'Volvo'],
dtype=object),
array(['Diesel', 'LPG', 'Petrol'], dtype=object)]),
['name', 'company','fuel_type'])])),
('linearregression', LinearRegression())])
y_pred=pipe.predict(x_test)
y_pred
y_test
322 210000
204 500000
42 284999
606 500000
513 159000
...
801 465000
711 200000
731 300000
757 150000
379 130000
Name: Price, Length: 164, dtype: int32
r2_score(y_test,y_pred)
0.6863234123258164
# checking for maximum r2_score
scores=[]
for i in range(1000):
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=i)
lr=LinearRegression()
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 50
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
pipe=make_pipeline(column_trans,lr)
pipe.fit(x_train,y_train)
y_pred=pipe.predict(x_test)
scores.append(r2_score(y_test,y_pred))
import numpy as np
np.argmax(scores)
906
scores[np.argmax(scores)]
0.7768125045875028
#Training the model using highest r2_score
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=np.argmax(scores))
lr=LinearRegression()
pipe=make_pipeline(column_trans,lr)
pipe.fit(x_train,y_train)
y_pred=pipe.predict(x_test)
r2_score(y_test,y_pred)
0.8456515104452564
#predicting the price by taking input features
pipe.predict(pd.DataFrame([['Maruti Suzuki Swift','Maruti',2019,100,'Petrol']],
columns=['name','company','year','kms_driven','fuel_type']))
#prediction
array([459113.49353657]
# dumping the LinearRegressionModel.pkl file using pickle for further development process
import pickle
pickle.dump(pipe,open('LinearRegressionModel.pkl','wb'))
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 51
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
5.3 Code Snippets
1. home.html
<!doctype html>
<html lang="en">
<head>
<!-- Required meta tags -->
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-
fit=no">
<link rel="stylesheet" href="static/css/style.css">
<!-- Bootstrap CSS -->
<linkrel="stylesheet"
href="https://cdn.jsdelivr.net/npm/bootstrap@4.1.3/dist/css/bootstrap.min.css"
integrity="sha384-
MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPM
O" crossorigin="anonymous">
<title>Car Price Predictor</title>
</head>
<body class="backgroundColor">
<div class=" d-flex flex-column flex-md-row align-items-center p-3 px-md-4 mb-3
navbar-light" style="background-color: #e0f2f1;">
<h5 class="my-0 mr-md-auto font-weight-normal"><b><h4>CAR PRICE
PREDICTOR</h4></b></h5>
<nav class="my-2 my-md-0 mr-md-3 ">
<a class="p-2 text-dark" href="{{url_for('home')}}"><b>Home</b></a>
</nav>
<a class="btn btn-outline-primary" href="/logout">Log out</a>
</div>
<div class="container">
<div clas="row">
<div class="card mt-50" style="width:100%;height:100%">
<div class="card-header">
<div class="col-12" style="text-align:center">
<h1>Welcome to Car Price Predictor</h1>
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 52
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
</div>
</div>
<div class="card-body">
<form class="form" method="post" >
<div class="col-10 form-group" style="text-align: center">
<label> <b>Select company: </b></label>
<select class="selectpicker form-control" id="company" name="company"
required="1" onchange="load_car_models(this.id,'car_model')">
{% for company in companies %}
<option value="{{company}}">{{company}} </option>
{% endfor %}
</select>
</div>
<div class="col-10 form-group" style="text-align: center">
<label> <b>Select Model: </b></label>
<select class="selectpicker form-control" id="car_model" name="car_model"
required="1">
</select>
</div>
<div class="col-10 form-group" style="text-align: center">
<label> <b>Select Year of Purchase: </b></label>
<select class="selectpicker form-control" id="year" name="year" required="1">
{% for year in years %}
<option value="{{year}}">{{year}} </option>
{% endfor %}
</select>
</div>
<div class="col-10 form-group" style="text-align: center">
<label> <b>Select Fuel Type: </b></label>
<select class="selectpicker form-control" id="fuel_type" name="fuel_type"
required="1">
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 53
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
{% for fuel_type in fuel_types %}
<option value="{{fuel_type}}">{{fuel_type}} </option>
{% endfor %}
</select>
</div>
<div class="col-10 form-group" style="text-align: center">
<label> <b>Kilometers travelled: </b></label>
<input class="form-control" type="text" id="kms_driven" name="kms_driven"
placeholder="Enter no.of kms travelled" >
</input>
</div>
<div class="col-10 form-group" style="text-align: center">
<button class="btn btn-primary btn-block btn-lg" onclick="send_data()"
value="Predict">Predict Price</button>
</div>
</form>
<br>
<div class="row">
<div class="col-12" style="text-align: center">
<h3><span id="prediction"></span> </h3>
</div>
</div>
</div>
</div>
</div>
</div>
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 54
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
<script>
function load_car_models(company_id,car_model_id)
{
var company= document.getElementById(company_id);
var car_model= document.getElementById(car_model_id);
car_model.value="";
car_model.innerHTML="";
{% for company in companies %}
if(company.value == "{{company}}" )
{
{% for model in car_models %}
{% if company in model %}
var newOption = document.createElement("option");
newOption.value="{{ model }}";
newOption.innerHTML="{{ model }}";
car_model.options.add(newOption);
{% endif %}
{% endfor %}
}
{% endfor %}
}
function form_handler()
{
event.preventDefault();
}
function send_data()
{
document.querySelector('form').addEventListener('submit', form_handler);
var fd= new FormData(document.querySelector('form'));
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 55
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
var xhr=new XMLHttpRequest();
xhr.open('POST', '/predict', true);
document.getElementById("prediction").innerHTML="wait! predicting price...";
xhr.onreadystatechange= function()
{
if(xhr.readyState == XMLHttpRequest.DONE)
{
document.getElementById("prediction").innerHTML="The Predicted Price is: "+
xhr.responseText + " Rs/-";
}
}
xhr.onload=function(){};
xhr.send(fd);
}
</script>
<!-- Optional JavaScript -->
<!-- jQuery first, then Popper.js, then Bootstrap JS -->
<script src="https://code.jquery.com/jquery-3.3.1.slim.min.js" integrity="sha384-
q8i/X+965DzO0rT7abK41JStQIAqVgRVzpbzo5smXKp4YfRvH+8abtTE1Pi6jizo"
crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/popper.js@1.14.3/dist/umd/popper.min.js"
integrity="sha384-
ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49"
crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/bootstrap@4.1.3/dist/js/bootstrap.min.js"
integrity="sha384-
ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy"
crossorigin="anonymous"></script>
</body>
</html>
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 56
lOMoAR cPSD|24598226
Car Price Prediction
Downloaded by Rakesh Swain (srakeshswain005@gmail.com)
1. App.java
import pandas as pd
#from flask import Flask, render_template, request, url_for,redirect,session
import pickle
import numpy as np
from flask import *
import flask_login
import os
from num2words import num2words
import mysql.connector
model=pickle.load(open("LinearRegressionModel.pkl",'rb'))
car=pd.read_csv("cleaned car.csv")
app=Flask( name )
app.secret_key=os.urandom(24)
conn=mysql.connector.connect(
host='localhost',
user='root',
password='Password123@',
port='3306',
database='database'
)
mycursor=conn.cursor()
@app.route('/')
def login():
if 'user_id' in session:
return redirect('/home')
else:
return render_template('login.html')
@app.route('/register')
def register():
return render_template('register.html')
KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 57
Car Price prediction final pdf1.docx
Car Price prediction final pdf1.docx
Car Price prediction final pdf1.docx
Car Price prediction final pdf1.docx
Car Price prediction final pdf1.docx
Car Price prediction final pdf1.docx
Car Price prediction final pdf1.docx
Car Price prediction final pdf1.docx
Car Price prediction final pdf1.docx
Car Price prediction final pdf1.docx
Car Price prediction final pdf1.docx
Car Price prediction final pdf1.docx
Car Price prediction final pdf1.docx
Car Price prediction final pdf1.docx
Car Price prediction final pdf1.docx
Car Price prediction final pdf1.docx
Car Price prediction final pdf1.docx
Car Price prediction final pdf1.docx
Car Price prediction final pdf1.docx
Car Price prediction final pdf1.docx
Car Price prediction final pdf1.docx
Car Price prediction final pdf1.docx
Car Price prediction final pdf1.docx
Car Price prediction final pdf1.docx
Car Price prediction final pdf1.docx
Car Price prediction final pdf1.docx
Car Price prediction final pdf1.docx
Car Price prediction final pdf1.docx

More Related Content

What's hot

Smart traffic management system
Smart traffic management systemSmart traffic management system
Smart traffic management systemChirag Dalal
 
Output primitives in Computer Graphics
Output primitives in Computer GraphicsOutput primitives in Computer Graphics
Output primitives in Computer GraphicsKamal Acharya
 
Online birth certificate
Online birth certificateOnline birth certificate
Online birth certificatemazcuudAhmed
 
Sign language translator ieee power point
Sign language translator ieee power pointSign language translator ieee power point
Sign language translator ieee power pointMadhuri Yellapu
 
automatic number plate recognition
automatic number plate recognitionautomatic number plate recognition
automatic number plate recognitionSairam Taduvai
 
smart parking system
smart parking system smart parking system
smart parking system Che Tna
 
Project synopsis on face recognition in e attendance
Project synopsis on face recognition in e attendanceProject synopsis on face recognition in e attendance
Project synopsis on face recognition in e attendanceNitesh Dubey
 
Automatic Number Plate Recognition(ANPR) System Project
Automatic Number Plate Recognition(ANPR) System  Project Automatic Number Plate Recognition(ANPR) System  Project
Automatic Number Plate Recognition(ANPR) System Project Gulraiz Javaid
 
Face recognition attendance system
Face recognition attendance systemFace recognition attendance system
Face recognition attendance systemNaomi Kulkarni
 
License Plate Recognition
License Plate RecognitionLicense Plate Recognition
License Plate RecognitionAmr Rashed
 
Facial Expression Recognition System using Deep Convolutional Neural Networks.
Facial Expression Recognition  System using Deep Convolutional Neural Networks.Facial Expression Recognition  System using Deep Convolutional Neural Networks.
Facial Expression Recognition System using Deep Convolutional Neural Networks.Sandeep Wakchaure
 
National Hackathon - Problem Statements
National Hackathon - Problem StatementsNational Hackathon - Problem Statements
National Hackathon - Problem StatementsZaki Haider
 
18CSMP68 VTU Mobile Application Develeopment Lab Manual by Nithin, VVCE, Mysuru
18CSMP68 VTU Mobile Application Develeopment Lab Manual by Nithin, VVCE, Mysuru18CSMP68 VTU Mobile Application Develeopment Lab Manual by Nithin, VVCE, Mysuru
18CSMP68 VTU Mobile Application Develeopment Lab Manual by Nithin, VVCE, MysuruNithin Kumar,VVCE, Mysuru
 
Automatic number plate recognition
Automatic number plate recognitionAutomatic number plate recognition
Automatic number plate recognitionSaifullah Malik
 
Attendance system based on face recognition using python by Raihan Sikdar
Attendance system based on face recognition using python by Raihan SikdarAttendance system based on face recognition using python by Raihan Sikdar
Attendance system based on face recognition using python by Raihan Sikdarraihansikdar
 
Internship Presentation 1 Web Developer
Internship Presentation 1 Web DeveloperInternship Presentation 1 Web Developer
Internship Presentation 1 Web DeveloperHemant Sarthak
 
Machine learning Summer Training report
Machine learning Summer Training reportMachine learning Summer Training report
Machine learning Summer Training reportSubhadip Mondal
 
Emotion detection using cnn.pptx
Emotion detection using cnn.pptxEmotion detection using cnn.pptx
Emotion detection using cnn.pptxRADO7900
 

What's hot (20)

Smart traffic management system
Smart traffic management systemSmart traffic management system
Smart traffic management system
 
Output primitives in Computer Graphics
Output primitives in Computer GraphicsOutput primitives in Computer Graphics
Output primitives in Computer Graphics
 
Online birth certificate
Online birth certificateOnline birth certificate
Online birth certificate
 
Sign language translator ieee power point
Sign language translator ieee power pointSign language translator ieee power point
Sign language translator ieee power point
 
automatic number plate recognition
automatic number plate recognitionautomatic number plate recognition
automatic number plate recognition
 
smart parking system
smart parking system smart parking system
smart parking system
 
Project synopsis on face recognition in e attendance
Project synopsis on face recognition in e attendanceProject synopsis on face recognition in e attendance
Project synopsis on face recognition in e attendance
 
Automatic Number Plate Recognition(ANPR) System Project
Automatic Number Plate Recognition(ANPR) System  Project Automatic Number Plate Recognition(ANPR) System  Project
Automatic Number Plate Recognition(ANPR) System Project
 
Face recognition attendance system
Face recognition attendance systemFace recognition attendance system
Face recognition attendance system
 
License Plate Recognition
License Plate RecognitionLicense Plate Recognition
License Plate Recognition
 
Facial Expression Recognition System using Deep Convolutional Neural Networks.
Facial Expression Recognition  System using Deep Convolutional Neural Networks.Facial Expression Recognition  System using Deep Convolutional Neural Networks.
Facial Expression Recognition System using Deep Convolutional Neural Networks.
 
Smart Parking
Smart ParkingSmart Parking
Smart Parking
 
National Hackathon - Problem Statements
National Hackathon - Problem StatementsNational Hackathon - Problem Statements
National Hackathon - Problem Statements
 
18CSMP68 VTU Mobile Application Develeopment Lab Manual by Nithin, VVCE, Mysuru
18CSMP68 VTU Mobile Application Develeopment Lab Manual by Nithin, VVCE, Mysuru18CSMP68 VTU Mobile Application Develeopment Lab Manual by Nithin, VVCE, Mysuru
18CSMP68 VTU Mobile Application Develeopment Lab Manual by Nithin, VVCE, Mysuru
 
Automatic number plate recognition
Automatic number plate recognitionAutomatic number plate recognition
Automatic number plate recognition
 
ANPR
ANPRANPR
ANPR
 
Attendance system based on face recognition using python by Raihan Sikdar
Attendance system based on face recognition using python by Raihan SikdarAttendance system based on face recognition using python by Raihan Sikdar
Attendance system based on face recognition using python by Raihan Sikdar
 
Internship Presentation 1 Web Developer
Internship Presentation 1 Web DeveloperInternship Presentation 1 Web Developer
Internship Presentation 1 Web Developer
 
Machine learning Summer Training report
Machine learning Summer Training reportMachine learning Summer Training report
Machine learning Summer Training report
 
Emotion detection using cnn.pptx
Emotion detection using cnn.pptxEmotion detection using cnn.pptx
Emotion detection using cnn.pptx
 

Similar to Car Price prediction final pdf1.docx

Report[Batch-08].pdf
Report[Batch-08].pdfReport[Batch-08].pdf
Report[Batch-08].pdf052Sugashk
 
Dsp lab manual
Dsp lab manualDsp lab manual
Dsp lab manualamanabr
 
Terminologies.pptx
Terminologies.pptxTerminologies.pptx
Terminologies.pptxRupeshAditya
 
Week2_ MKA_Introduction engineering dicipline.pptx
Week2_ MKA_Introduction engineering dicipline.pptxWeek2_ MKA_Introduction engineering dicipline.pptx
Week2_ MKA_Introduction engineering dicipline.pptx224787
 
PEO, PO'S & PSO'S.pdf
PEO, PO'S & PSO'S.pdfPEO, PO'S & PSO'S.pdf
PEO, PO'S & PSO'S.pdfRAJESH PYLA
 
PPS Manual_AY_2021-22 I & II Sem.pdf
PPS Manual_AY_2021-22 I & II Sem.pdfPPS Manual_AY_2021-22 I & II Sem.pdf
PPS Manual_AY_2021-22 I & II Sem.pdfSugnanaraoM
 
Workshop manual
Workshop manualWorkshop manual
Workshop manualshadAhmor
 
Introductory PPT CSC202 SECURITY ARCHITECTURE.pptx
Introductory PPT CSC202 SECURITY ARCHITECTURE.pptxIntroductory PPT CSC202 SECURITY ARCHITECTURE.pptx
Introductory PPT CSC202 SECURITY ARCHITECTURE.pptxAkash Bhasney
 
Key Components of OBE for NBA and preparing Course file
Key Components of OBE for NBA and preparing Course fileKey Components of OBE for NBA and preparing Course file
Key Components of OBE for NBA and preparing Course fileRajsekhar33797
 
Chennai-PPT-3-Key Components of OBE-RVR-08-06-2018.pptx
Chennai-PPT-3-Key Components of OBE-RVR-08-06-2018.pptxChennai-PPT-3-Key Components of OBE-RVR-08-06-2018.pptx
Chennai-PPT-3-Key Components of OBE-RVR-08-06-2018.pptxAbhishek pradeep
 
Drexel University Undergraduate Fact Sheet 2015
Drexel University Undergraduate Fact Sheet 2015Drexel University Undergraduate Fact Sheet 2015
Drexel University Undergraduate Fact Sheet 2015Katherine Gamble
 
Iare dsd lab_manual
Iare dsd lab_manualIare dsd lab_manual
Iare dsd lab_manualazeez786
 
Materials Engineering Digital Notes.pdf
Materials Engineering Digital Notes.pdfMaterials Engineering Digital Notes.pdf
Materials Engineering Digital Notes.pdfSaiM624606
 
B Tech Civil Engineering Syllabus at MITAOE
B Tech Civil Engineering Syllabus at MITAOEB Tech Civil Engineering Syllabus at MITAOE
B Tech Civil Engineering Syllabus at MITAOEMITAcademy1
 
Outcome Based Education - Comsats Abbottabad - Civil Engineering.pptx
Outcome Based Education - Comsats Abbottabad - Civil Engineering.pptxOutcome Based Education - Comsats Abbottabad - Civil Engineering.pptx
Outcome Based Education - Comsats Abbottabad - Civil Engineering.pptxAwais Marwat
 
01.B.E. CSE final.doc
01.B.E. CSE final.doc01.B.E. CSE final.doc
01.B.E. CSE final.doctsajuraj
 

Similar to Car Price prediction final pdf1.docx (20)

Report[Batch-08].pdf
Report[Batch-08].pdfReport[Batch-08].pdf
Report[Batch-08].pdf
 
Dsp lab manual
Dsp lab manualDsp lab manual
Dsp lab manual
 
Terminologies.pptx
Terminologies.pptxTerminologies.pptx
Terminologies.pptx
 
Week2_ MKA_Introduction engineering dicipline.pptx
Week2_ MKA_Introduction engineering dicipline.pptxWeek2_ MKA_Introduction engineering dicipline.pptx
Week2_ MKA_Introduction engineering dicipline.pptx
 
PEO, PO'S & PSO'S.pdf
PEO, PO'S & PSO'S.pdfPEO, PO'S & PSO'S.pdf
PEO, PO'S & PSO'S.pdf
 
PPS Manual_AY_2021-22 I & II Sem.pdf
PPS Manual_AY_2021-22 I & II Sem.pdfPPS Manual_AY_2021-22 I & II Sem.pdf
PPS Manual_AY_2021-22 I & II Sem.pdf
 
Workshop manual
Workshop manualWorkshop manual
Workshop manual
 
Introductory PPT CSC202 SECURITY ARCHITECTURE.pptx
Introductory PPT CSC202 SECURITY ARCHITECTURE.pptxIntroductory PPT CSC202 SECURITY ARCHITECTURE.pptx
Introductory PPT CSC202 SECURITY ARCHITECTURE.pptx
 
Key Components of OBE.pptx
Key Components of OBE.pptxKey Components of OBE.pptx
Key Components of OBE.pptx
 
Key Components of OBE for NBA and preparing Course file
Key Components of OBE for NBA and preparing Course fileKey Components of OBE for NBA and preparing Course file
Key Components of OBE for NBA and preparing Course file
 
Chennai-PPT-3-Key Components of OBE-RVR-08-06-2018.pptx
Chennai-PPT-3-Key Components of OBE-RVR-08-06-2018.pptxChennai-PPT-3-Key Components of OBE-RVR-08-06-2018.pptx
Chennai-PPT-3-Key Components of OBE-RVR-08-06-2018.pptx
 
Drexel University Undergraduate Fact Sheet 2015
Drexel University Undergraduate Fact Sheet 2015Drexel University Undergraduate Fact Sheet 2015
Drexel University Undergraduate Fact Sheet 2015
 
Iare dsd lab_manual
Iare dsd lab_manualIare dsd lab_manual
Iare dsd lab_manual
 
Materials Engineering Digital Notes.pdf
Materials Engineering Digital Notes.pdfMaterials Engineering Digital Notes.pdf
Materials Engineering Digital Notes.pdf
 
Hdllab1
Hdllab1Hdllab1
Hdllab1
 
B Tech Civil Engineering Syllabus at MITAOE
B Tech Civil Engineering Syllabus at MITAOEB Tech Civil Engineering Syllabus at MITAOE
B Tech Civil Engineering Syllabus at MITAOE
 
Outcome Based Education - Comsats Abbottabad - Civil Engineering.pptx
Outcome Based Education - Comsats Abbottabad - Civil Engineering.pptxOutcome Based Education - Comsats Abbottabad - Civil Engineering.pptx
Outcome Based Education - Comsats Abbottabad - Civil Engineering.pptx
 
2017 BE CSE Syllabus Anna University Affiliated Colleges
2017 BE CSE Syllabus Anna University Affiliated Colleges2017 BE CSE Syllabus Anna University Affiliated Colleges
2017 BE CSE Syllabus Anna University Affiliated Colleges
 
01.b.e. cse final
01.b.e. cse final01.b.e. cse final
01.b.e. cse final
 
01.B.E. CSE final.doc
01.B.E. CSE final.doc01.B.E. CSE final.doc
01.B.E. CSE final.doc
 

Recently uploaded

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Recently uploaded (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Car Price prediction final pdf1.docx

  • 1. lOMoAR cPSD|24598226 Downloaded by Rakesh Swain (srakeshswain005@gmail.com) Car Price prediction final pdf Computer science (AJ Institute of Engineering and Technology) Studocu is not sponsored or endorsed by any college or university
  • 2. lOMoAR cPSD|24598226 Downloaded by Rakesh Swain (srakeshswain005@gmail.com) A Dinesh A Rahul (19BD1A05C1) (19BD1A05C5) E Sri Kumar (19BD1A05CJ) G Pranav (19BD1A05CK) Under the guidance of Ms. NASREEN SULTANA Assistant Professor Department of CSE A Mini Project Report on CAR PRICE PREDICTION USING LINEAR REGRESSION Submitted to Jawaharlal Nehru Technological University, Hyderabad in partial fulfillment of requirements for the award of the degree of BACHELOR OF TECHNOLOGY In COMPUTER SCIENCE AND ENGINEERING By Department of Computer Science and Engineering KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Approved by AICTE, Affiliated to JNTUH 3-5-1206, Narayanaguda, Hyderabad – 500029 2022-2023
  • 3. lOMoAR cPSD|24598226 Downloaded by Rakesh Swain (srakeshswain005@gmail.com) KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY (Accredited by NBA & NAAC, Approved By A.I.C.T.E., Reg by Govt of Telangana State & Affiliated to JNTU, Hyderabad) DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CERTIFICATE This is to certify that the project entitled CAR PRICE PREDICTION USING LINEAR REGRESSION being submitted by A. Dinesh A. Rahul E. Sri Kumar G. Pranav (19BD1A05C1) (19BD1A05C5) (19BD1A05CJ) (19BD1A05CK) In partial fulfilment for the award of Bachelor of Technology in Computer Science and Engineering affiliated to the Jawaharlal Nehru Technological University, Hyderabad during the year 2022-23. Internal Guide Head of the Department (Ms. Nasreen Sultana) (Dr. S. Padmaja) Submitted for Viva Voice Examination held on External Examiner Unit of Keshav Memorial Educational Society #: 3-5-1026 Narayanaguda Hyderabad 500029. 040-3261407 www.kmit.in e-mail: principal@kmit.in
  • 4. lOMoAR cPSD|24598226 Downloaded by Rakesh Swain (srakeshswain005@gmail.com) Vision of KMIT Producing quality graduates trained in the latest technologies and related tools and striving to make India a world leader in software and hardware products and services. To achieve academic excellence by imparting in depth knowledge to the students, facilitating research activities and catering to the fast growing and ever- changing industrial demands and societal needs. Mission of KMIT  To provide a learning environment that inculcates problem solving skills, professional, ethical responsibilities, lifelong learning through multi modal platforms and prepare students to become successful professionals.  To establish industry institute Interaction to make students ready for the industry.  To provide exposure to students on latest hardware and software tools.  To promote research based projects/activities in the emerging areas of technology convergence.  To encourage and enable students to not merely seek jobs from the industry but also to create new enterprises.  To induce a spirit of nationalism which will enable the student to develop, understand lndia's challenges and to encourage them to develop effective solutions.  To support the faculty to accelerate their learning curve to deliver excellent service to students.
  • 5. lOMoAR cPSD|24598226 Downloaded by Rakesh Swain (srakeshswain005@gmail.com) Vision & Mission of CSE Vision of the CSE To be among the region's premier teaching and research Computer Science and Engineering departments producing globally competent and socially responsible graduates in the most conducive academic environment. Mission of the CSE  To provide faculty with state of the art facilities for continuous professional development and research, both in foundational aspects and of relevance to emerging computing trends.  To impart skills that transform students to develop technical solutions for societal needs and inculcate entrepreneurial talents.  To inculcate an ability in students to pursue the advancement of knowledge in various specializations of Computer Science and Engineering and make them industry-ready.  To engage in collaborative research with academia and industry and generate adequate resources for research activities for seamless transfer of knowledge resulting in sponsored projects and consultancy.  To cultivate responsibility through sharing of knowledge and innovative computing solutions that benefit the society-at-large.  To collaborate with academia, industry and community to set high standards in academic excellence and in fulfilling societal responsibilities.
  • 6. lOMoAR cPSD|24598226 Downloaded by Rakesh Swain (srakeshswain005@gmail.com) PROGRAM OUTCOMES (POs) 1. Engineering Knowledge: Apply the knowledge of mathematics, science, engineering fundamentals and an engineering specialization to the solution of complex engineering problems. 2. Problem analysis: Identify formulate, review research literature, and analyse complex engineering problems reaching substantiated conclusions using first principles of mathematics, natural sciences, and engineering sciences 3. Design/development of solutions: Design solutions for complex engineering problem and design system component or processes that meet the specified needs with appropriate consideration for the public health and safety, and the cultural societal, and environmental considerations. 4. Conduct investigations of complex problems: Use research-based knowledge and research methods including design of experiments, analysis and interpretation of data, and synthesis of the information to provide valid conclusions. 5. Modern tool usage: Create select, and, apply appropriate techniques, resources, and modern engineering and IT tools including prediction and modelling to complex engineering activities with an understanding of the limitations. 6. The engineer and society: Apply reasoning informed by the contextual knowledge to societal, health, safety. legal und cultural issues and the consequent responsibilities relevant to professional engineering practice. 7. Environment and sustainability: Understand the impact of the professional engineering solutions in societal and environmental contexts and demonstrate the knowledge of, and need for sustainable development. 8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and norms of the engineering practice. 9. Individual and team work: Function effectively as an individual, and as a member or leader in diverse teams and in multidisciplinary settings. 10. Communication: Communicate effectively on complex engineering activities with the engineering community and with society at large, such as, being able to comprehend and write effective reports and design documentation make effective presentations, and give and receive clear instructions. 11. Project management and finance: Demonstrate knowledge and understanding of the engineering and management principles and apply these to one's own work, as a member and leader in a team, to manage projects and in multidisciplinary environments. 12. Life-long learning: Recognize the need for, and have the preparation and ability to engage in independent and life-long learning in the broadest context of technological change.
  • 7. lOMoAR cPSD|24598226 Downloaded by Rakesh Swain (srakeshswain005@gmail.com) PROGRAM SPECIFIC OUTCOMES (PSOs) PSO1: An ability to analyse the common business functions to design and develop appropriate Information Technology solutions for social upliftment. PSO2: Shall have expertise on the evolving technologies like Python, Machine Learning, Deep Learning, Internet of Things (IOT), Data Science, Full stack development, Social Networks, Cyber Security, Big Data, Mobile Apps, CRM, ERP etc.. PROGRAM EDUCATIONAL OBJECTIVES (PEOs) PEO1: Graduates will have successful careers in computer related engineering fields or will be able to successfully pursue advanced higher education degrees. PEO2: Graduates will try and provide solutions to challenging problems in their profession by applying computer engineering principles. PEO3: Graduates will engage in life-long learning and professional development by rapidly adapting changing work environment. PEO4: Graduates will communicate effectively, work collaboratively and exhibit high levels of professionalism and ethical responsibility.
  • 8. lOMoAR cPSD|24598226 Downloaded by Rakesh Swain (srakeshswain005@gmail.com) PROJECT OUTCOMES P1: To provide a friendly environment to the user P2: To predict dependent variable for given user input data(features). P3: To give the accurate price for used cars. P4: Developing web applications using flask-framework. LOW - 1 MEDIUM - 2 HIGH - 3 PROJECT OUTCOMES MAPPING PROGRAM OUTCOMES PO PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 P1 3 3 2 2 P2 2 2 2 2 2 1 P3 2 2 3 2 2 2 P4 1 2 3 2 2 1 PROJECT OUTCOMES MAPPING PROGRAM SPECIFIC OUTCOMES PSO PSO1 PSO2 P1 1 P2 3
  • 9. lOMoAR cPSD|24598226 Downloaded by Rakesh Swain (srakeshswain005@gmail.com) P3 2 3 P4 1 3 PROJECT OUTCOMES MAPPING PROGRAM EDUCATIONAL OBJECTIVES PEO PEO1 PEO2 PEO3 PEO4 P1 1 2 P2 2 P3 1 2 P4 1 2 2
  • 10. lOMoAR cPSD|24598226 Downloaded by Rakesh Swain (srakeshswain005@gmail.com) DECLARATION We hereby declare that the project report entitled ―CAR PRICE PREDICTION USING LINEAR REGRESSION (M.L)” is done in the partial fulfillment for the award of the Degree in Bachelor of Technology in Computer Science and Engineering affiliated to Jawaharlal Nehru Technological University, Hyderabad. This project has not been submitted anywhere else. ABBI DINESH (19BD1A05C1) ASAD RAHUL (19BD1A05C5) ERRAGALA SRI KUMAR (19BD1A05CJ) GYARA PRANAV KUMAR (19BD1A05CK)
  • 11. lOMoAR cPSD|24598226 Downloaded by Rakesh Swain (srakeshswain005@gmail.com) ACKNOWLEDGMENT We take this opportunity to thank all the people who have rendered their full support to our project work. We render our thanks to Dr. Maheshwar Dutta, B.E., M Tech., Ph.D., Principal who encouraged us to do the Project. We are grateful to Mr. Neil Gogte, Director for facilitating all the amenities required for carrying out this project. We express our sincere gratitude to Mr. S. Nitin, Director and Dr. D. Jaya Prakash, Dean Academics for providing an excellent environment in the college. We are also thankful to Dr. S. Padmaja, Head of the Department for providing us with both time and amenities to make this project a success within the given schedule. We are also thankful to our guide Ms. Nasreen Sultana, for her valuable guidance and encouragement given to us throughout the project work. We would like to thank the entire CSE Department faculty, who helped us directly and indirectly in the completion of the project. We sincerely thank our friends and family for their constant motivation during the project work. ABBI DINESH (19BD1A05C1) ASAD RAHUL (19BD1A05C5) ERRAGALA SRI KUMAR (19BD1A05CJ) GYARA PRANAV KUMAR (19BD1A05CK)
  • 12. lOMoAR cPSD|24598226 Downloaded by Rakesh Swain (srakeshswain005@gmail.com) CONTENT DESCRIPTION PAGE NO. ABSTRACT i LIST OF FIGURES ii LIST OF TABLES iii CHAPTERS 1. INTRODUCTION 1-14 1.1. Machine Learning 1 1.2. What is Machine Learning 1 1.3. Types of Machine Learning 3 1.4. Linear Regression 8 1.5. Objective & Problem Statement 13 1.6. Purpose of Project 13 1.7. Architecture Diagram 14 1.8. Project Goal 14 2. SOFTWARE REQUIREMENTS SPECIFICATIONS 15-16 2.1. Requirements Specification Document 16 2.2. Functional Requirements 17 2.3. Non-Functional Requirements 17 2.4. Software Requirements 18 2.5. Hardware Requirements 18 2.6. Requirement Analysis 19 2.7. Test Construction and verification 20 2.8. Test Execution and Bug Reporting 20 2.9. Final Testing and Implementation 20 2.10. Post Implementation 20 2.11. Technologies used 21
  • 13. lOMoAR cPSD|24598226 Downloaded by Rakesh Swain (srakeshswain005@gmail.com) CAR PRICE PREDICTION 3. LITERATURE SURVEY 3.1. Proposed Model 3.2. Paper Work 3.3. Related Work 24-27 25 26 27 4. SYSTEM DESIGN 28-33 4.1. Introduction to UML 29 4.2. UML Diagrams 29 4.2.1. Use Case diagram 29 4.2.2. Sequence diagram 31 4.2.3. Class diagram 33 4.2.4. System Design 34 4.2.5. State Chart Diagram 36 5. IMPLEMENTATION 38-59 5.1. Pseudo code 39 5.2. Data Cleaning using Google Colab 40 5.2. Code Snippets 52 6. TESTING 60-72 6.1. Introduction to Testing 61 6.2. Test Cases 63 7. SCREENSHOTS 73-75 7.1. Layout of Testing Platform 74 7.2. Log & Reference 74 7.3. UI of Web Application 75 8.FURTHER ENHANCEMENTS 76 9.CONCLUSION 78 10.REFERENCES 80
  • 14. lOMoAR cPSD|24598226 Downloaded by Rakesh Swain (srakeshswain005@gmail.com) ABSTRACT In this fast-moving generation, the present study proposes the newer concept of predicting the prices of certain items. With an idea and motivation to help everyone we came up with a solution to get an appropriate estimate of one’s car using Machine Learning Techniques which will save a lot of time and money. A car price prediction has been a high interest research area, as it requires noticeable effort and knowledge of the field expert. Considerable number of distinct attributes is examined for the reliable and accurate prediction. The production of cars has been steadily increasing in the past decade, with over 70 million passenger cars being produced in the year 2016. This has given rise to the used car market, which on its own has become a booming industry. The recent advent of online portals has facilitated the need for both the customer and the seller to be better informed about the trends and patterns that determine the value of a used car in the market. To build a model for predicting the price of used cars in, we applied one of the machine learning techniques i.e., Linear Regression. Using linear regression, there are multiple independent variables, but one and only one dependent variable whose actual and predicted values are compared to find precision of results. Our paper proposes a system where price is dependent variable which is predicted, and this price is derived from factors like kilometers driven, car purchase year, Car Company, car model, and the fuel type. Keywords: Car Price Prediction, Linear Regression, Machine Learning, dependent variable etc.
  • 15. lOMoAR cPSD|24598226 Downloaded by Rakesh Swain (srakeshswain005@gmail.com) LIST OF FIGURES LIST OF FIGURES PAGE NO 1.1 Machine Learning 1 1.2 Machine Learning & Traditional Programming 2 1.3 Types of Machine Learning 3 1.3.1 Data Set of Supervised Learning 3 1.3.1.2 Types of Supervised Learning 4 1.3.2 Unsupervised 5 1.3.2.1 Types of Unsupervised Learning 6 1.3.4 Reinforcement Learning 7 1.4 Linear Regression 8 1.7 Architecture of Linear Regression’ 14 3.8.1 Google colab 22 4.2.1 Use Case Diagram -UML 30 4.2.2 Sequence Diagram –UML 32 4.2.3 Class Diagram –UML 33 4.2.4 System Design-UML 35 4.2.5 State Chart Diagram –UML 37 7.1 Selenium IDE Testing Platform 74 7.2 Log & Reference using Selenium IDE 74
  • 16. lOMoAR cPSD|24598226 Downloaded by Rakesh Swain (srakeshswain005@gmail.com) CAR PRICE PREDICTION 7.3 Register page of Web Application -UI 75 7.4 Login page of Web Application -UI 75 7.5 Home page of Web Application-UI 76 7.6 Displaying available car companies -UI 76 7.7 Displaying suitable car models -UI 77 7.8 Displaying available years -UI 77 7.9 Displaying available Fuel Types- UI 78 7.10 Displaying Predicted Price -UI 78
  • 17. lOMoAR cPSD|24598226 Downloaded by Rakesh Swain (srakeshswain005@gmail.com) LIST OF TABLES 6.2 Test Case for Web Application 62 6.2.1 Launching web application 62 6.2.2 Registration of user details 64 6.2.3 Login Positive test case 65 6.2.4 Login Negative test case 66 6.2.5 Displaying Attributes 66 6.2.6 Selecting Attributes 68 6.2.7 Selecting attributes for correct attributes 69 6.2.8 Selecting attributes for incorrect attributes 70 6.2.9 Home button Test case 71 6.2.10 Logout button Test case 72
  • 18. lOMoAR cPSD|24598226 Downloaded by Rakesh Swain (srakeshswain005@gmail.com) CHAPTER -1
  • 19. lOMoAR cPSD|24598226 CAR PRICE PREDICTION Downloaded by Rakesh Swain (srakeshswain005@gmail.com) Figure-1.1 Machine Learning Machine Learning? 1. INTRODUCTION 1.1 MACHINE LEARNING Machine Learning is the field of study that gives computers the capability to learn without being explicitly programmed. ML is one of the most exciting technologies that one would have ever come across. As it is evident from the name, it gives the computer that makes it more similar to humans: The ability to learn. Machine learning is actively being used today, perhaps in many more places than one would expect. 1.2 What is Arthur Samuel, a pioneer in the field of artificial intelligence and computer gaming, coined the term ―Machine Learning‖. He defined machine learning as – a ―Field of study that gives computers the capability to learn without being explicitly programmed‖. In a very layman’s manner, Machine Learning (ML) can be explained as automating and improving the learning process of computers based on their experiences without being actually programmed i.e. without any human assistance. The process starts with feeding good quality data and then training our machines(computers) by building machine learning models using the data and different algorithms. The choice of algorithms depends on what type of data do we have and what kind of task we are trying to automate. Example: Training of students during exams. While preparing for the exams students don’t actually cram the subject but try to learn it with complete understanding. Before the examination, they feed their machine(brain) with a good amount of high-quality data (questions and answers from different books or teachers’ notes, or online video lectures). KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 1
  • 20. lOMoAR cPSD|24598226 CAR PRICE PREDICTION Downloaded by Rakesh Swain (srakeshswain005@gmail.com) Figure 1.2 Machine Learning & Traditional Programming Actually, they are training their brain with input as well as output i.e, what kind of approach or logic do they have to solve a different kinds of questions. Each time they solve practice test papers and find the performance (accuracy /score) by comparing answers with the answer key given, Gradually, the performance keeps on increasing, gaining more confidence with the adopted approach. That’s how actually models are built, train machine with data (both inputs and outputs are given to the model), and when the time comes test on data (with input only) and achieve our model scores by comparing its answer with the actual output which has not been fed while training. Researchers are working with assiduous efforts to improve algorithms, and techniques so that these models perform even much better. 1.2.1 Basic Difference in ML and Traditional Programming? Traditional Programming: We feed in DATA (Input) + PROGRAM (logic), run it on the machine, and get the output. Machine Learning: We feed in DATA (Input) + Output, run it on the machine during training and the machine creates its own program (logic), which can be evaluated while testing. KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 2
  • 21. lOMoAR cPSD|24598226 CAR PRICE PREDICTION Downloaded by Rakesh Swain (srakeshswain005@gmail.com) 1.3 ML | Types of Machine Learning A machine is said to be learning from past experiences (data feed-in) with respect to some class of tasks if its Performance in a given Task improves with the Experience. For example, assume that a machine has to predict whether a customer will buy a specific product let’s say ―Antivirus‖ this year or not. The machine will do it by looking at the previous knowledge/past experiences i.e the data of products that the customer had bought every year and if he buys Antivirus every year, then there is a high probability that the customer is going to buy an antivirus this year as well. This is how machine learning works at the basic conceptual level. Figure 1.3 Types of Machine Learning 1.3.1 Supervised Learning Supervised learning is when the model is getting trained on a labeled dataset. A labeled dataset is one that has both input and output parameters. In this type of learning training and validation, datasets are labeled as shown in the figures below. Example Figure 1.3.1 Data Set Both the above figures have labeled data set as follows: Figure A: It is a dataset of a shopping store that is useful in predicting whether a KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 3
  • 22. lOMoAR cPSD|24598226 CAR PRICE PREDICTION Downloaded by Rakesh Swain (srakeshswain005@gmail.com) customer will purchase a particular product under consideration or not based on his/ her gender, age, and salary. Input: Gender, Age, Salary Output: Purchased i.e. 0 or 1; 1 means yes the customer will purchase and 0 means that the customer won’t purchase it. Figure B: It is a Meteorological dataset that serves the purpose of predicting wind speed based on different parameters. Input: Dew Point, Temperature, Pressure, Relative Humidity, Wind Direction Output: Wind Speed 1.3.1 Types of Supervised Learning: A. Classification: Figure 1.3.1 Types of Supervised Learning It is a Supervised Learning task where output is having defined labels (discrete value). For example in above Figure A, Output – Purchased has defined labels i.e. 0 or 1; 1 means the customer will purchase, and 0 means that the customer won’t purchase. The goal here is to predict discrete values belonging to a particular class and evaluate them on the basis of accuracy. It can be either binary or multi-class classification. In binary classification, the model predicts either 0 or 1; yes or no but in the case of multi-class classification, the model predicts more than one class. Example: Gmail classifies mails in more than one class like social, promotions, updates, and forums. KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 4
  • 23. lOMoAR cPSD|24598226 CAR PRICE PREDICTION Downloaded by Rakesh Swain (srakeshswain005@gmail.com) B. Regression: It is a Supervised Learning task where output is having continuous value. For example in above Figure B, Output – Wind Speed is not having any discrete value but is continuous in a particular range. The goal here is to predict a value as much closer to the actual output value as our model can and then evaluation is done by calculating the error value. The smaller the error, the greater the accuracy of our regression model. Example of Supervised Learning Algorithms:  Linear Regression  Logistic Regression  Nearest Neighbor  Gaussian Naive Bayes  Decision Trees  Support Vector Machine (SVM)  Random Forest 1.3.2 Unsupervised Learning: Unsupervised machine learning analyzes and clusters unlabeled datasets using machine learning algorithms. These algorithms find hidden patterns and data without any human intervention, i.e., we don’t give output to our model. The training model has only input parameter values and discovers the groups or patterns on its own. Data-set in Figure A is Mall data that contains information about its clients that subscribe to them. Once subscribed they are provided a membership card and the mall has complete information about the customer and his/her every purchase. Now using this data and unsupervised learning techniques, the mall can easily group clients based on the parameters we are feeding in. Figure 1.3.2 Unsupervised Learning KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 5
  • 24. lOMoAR cPSD|24598226 CAR PRICE PREDICTION Downloaded by Rakesh Swain (srakeshswain005@gmail.com) The input to the unsupervised learning models is as follows: Unstructured data: May contain noisy (meaningless) data, missing values, or unknown data 1.3.2.1 Types of Unsupervised Learning are as follows: Figure 1.3.2.1 Types of Unsupervised Clustering: Broadly this technique is applied to group data based on different patterns, such as similarities or differences, our machine model finds. These algorithms are used to process raw, unclassified data objects into groups. For example, in the above figure, we have not given output parameter values, so this technique will be used to group clients based on the input parameters provided by our data. Association: This technique is a rule-based ML technique that finds out some very useful relations between parameters of a large data set. This technique is basically used for market basket analysis that helps to better understand the relationship between different products. For e.g. shopping stores use algorithms based on this technique to find out the relationship between the sale of one product w.r.t to another’s sales based on customer behavior. Like if a customer buys milk, then he may also buy bread, eggs, or butter. Once trained well, such models can be used to increase their sales by planning different offers. Some algorithms: K-Means Clustering DBSCAN – Density-Based Spatial Clustering of Applications with Noise BIRCH – Balanced Iterative Reducing and Clustering using Hierarchies Hierarchical Clustering 1.3.3 Semi-supervised Learning: As the name suggests, its working lies between Supervised and Unsupervised techniques. We use these techniques when we are dealing with data that is a little bit labeled and the rest large portion of it is unlabeled. We can use the unsupervised techniques to predict labels and then feed these labels to supervised techniques. KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 6
  • 25. lOMoAR cPSD|24598226 CAR PRICE PREDICTION Downloaded by Rakesh Swain (srakeshswain005@gmail.com) This technique is mostly applicable in the case of image data sets where usually all images are not labeled. 1.3.4 Reinforcement Learning: In this technique, the model keeps on increasing its performance using Reward Feedback to learn the behavior or pattern. These algorithms are specific to a particular problem e.g. Google Self Driving car, AlphaGo where a bot competes with humans and even itself to get better and better performers in Go Game. Each time we feed in data, they learn and add the data to their knowledge which is training data. So, the more it learns the better it gets trained and hence experienced. Figure1.3.4 Reinforcement  Agents observe input.  An agent performs an action by making some decisions.  After its performance, an agent receives a reward and accordingly reinforces and the model  stores in state-action pair of information.  Temporal Difference (TD)  Q-Learning and Deep Adversarial Networks. KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 7
  • 26. lOMoAR cPSD|24598226 CAR PRICE PREDICTION Downloaded by Rakesh Swain (srakeshswain005@gmail.com) 1.4 ML | Linear Regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable. In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimated from the data. Such models are called linear models. Most commonly, the conditional mean of the response given the values of the explanatory variables (or predictors) is assumed to be an affine function of those values; less commonly, the conditional median or some other quantile is used. Like all forms of regression analysis, linear regression focuses on the conditional probability distribution of the response given the values of the predictors, rather than on the joint probability distribution of all of these variables, which is the domain of multivariate analysis. Linear regression was the first type of regression analysis to be studied rigorously, and to be used extensively in practical applications. This is because models which depend linearly on their unknown parameters are easier to fit than models which are non-linearly related to their parameters and because the statistical properties of the resulting estimators are easier to determine. Figure 1.4 Linear Regression KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 8
  • 27. lOMoAR cPSD|24598226 CAR PRICE PREDICTION Downloaded by Rakesh Swain (srakeshswain005@gmail.com) i=1 i Linear regression has many practical uses. Most applications fall into one of the following two broad categories:  If the goal is prediction, forecasting, or error reduction,[clarification needed] linear regression can be used to fit a predictive model to an observed data set of values of the response and explanatory variables. After developing such a model, if additional values of the explanatory variables are collected without an accompanying response value, the fitted model can be used to make a prediction of the response.  If the goal is to explain variation in the response variable that can be attributed to variation in the explanatory variables, linear regression analysis can be applied to quantify the strength of the relationship between the response and the explanatory variables, and in particular to determine whether some explanatory variables may have no linear relationship with the response at all, or to identify which subsets of explanatory variables may contain redundant information about the response. Linear regression models are often fitted using the least squares approach, but they may also be fitted in other ways, such as by minimizing the "lack of fit" in some other norm (as with least absolute deviations regression), or by minimizing a penalized version of the least squares cost function as in ridge regression (L2-norm penalty) and lasso (L1-norm penalty). Conversely, the least squares approach can be used to fit models that are not linear models. Thus, although the terms "least squares" and "linear model" are closely linked, they are not synonymous. Given a data set *𝑦i𝑥i1, . . . 𝑥i𝑝+ of n statistical units, a linear regression model assumes that the relationship between the dependent variable y and the p-vector of regressors x is linear. This relationship is modeled through a disturbance term or error variable ε — an unobserved random variable that adds "noise" to the linear relationship between the dependent variable and regressors. Thus the model takes the form 𝑦i = 𝛽0 + 𝛽1𝑥i1+ . . . + 𝛽𝑝𝑥i𝑝 + si = 𝑥𝑇𝛽 + si, i =1, …n KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 9
  • 28. lOMoAR cPSD|24598226 CAR PRICE PREDICTION Downloaded by Rakesh Swain (srakeshswain005@gmail.com) where T denotes the transpose, so that xiTβ is the inner product between vectors xi and β. Often these n equations are stacked together and written in matrix notation as 𝑦 = 𝑥𝛽 + s, The very simplest case of a single scalar predictor variable x and a single scalar response variable y is known as simple linear regression. The extension to multiple and/or vector- valued predictor variables (denoted with a capital X) is known as multiple linear regression, also known as multivariable linear regression (not to be confused with multivariate linear regression. Multiple linear regression is a generalization of simple linear regression to the case of more than one independent variable, and a special case of general linear models, restricted to one dependent variable. The basic model for multiple linear regression is 𝑌i = 𝛽0 + 𝛽1𝑥i1+.... 𝛽𝑝𝑥i𝑝 + si for each observation i = 1, ..., n. In the formula above we consider n observations of one dependent variable and p independent variables. Thus, Yi is the ith observation of the dependent variable, Xij is ith observation of the jth independent variable, j = 1, 2, ..., p. The values βj represent parameters to be estimated, and εi is the ith independent identically distributed normal error. In the more general multivariate linear regression, there is one equation of the above form for each of m > 1 dependent variables that share the same set of explanatory variables and hence are estimated simultaneously with each other: 𝑌ij = 𝛽0j + 𝛽1j𝑥i1+ .... 𝛽𝑝j𝑥i𝑝 + sij for all observations indexed as i = 1,.... , n and for all dependent variables indexed as j = 1, ...., m. Nearly all real-world regression models involve multiple predictors, and basic descriptions of linear regression are often phrased in terms of the multiple regression model. Note, however, that in these cases the response variable y is still a scalar. Another term, multivariate linear regression, refers to cases where y is a vector, i.e., the same as general linear regression. KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 10
  • 29. lOMoAR cPSD|24598226 CAR PRICE PREDICTION Downloaded by Rakesh Swain (srakeshswain005@gmail.com) 1 ∑ 𝑁 1.4.1 Type of loss in a linear model: L1 loss: This is the difference between the predicted and actual values. It is also called mean absolute error (MAE). The model will calculate all the MAE values and add them to find the total L1 Loss. The formula of L1 loss is shown below. 𝑀𝐴𝐸 = 1 ∑ | 𝑦 − 𝑦 ̂| 𝑁 i=1 i where, 𝑦 ̂ i𝑠 𝑝𝑟e𝑑i𝑐𝑡e𝑑 𝑣𝑎𝑙𝑢e of 𝑦 𝑦 i𝑠 𝑚e𝑎𝑛 𝑣𝑎𝑙𝑢e of 𝑦 L2 Loss: In this loss, we take the squared average difference between the predicted and actual value. It is also known as Mean Squared Error (MSE). The formula of L2 loss is shown below. 𝑀𝑆𝐸 = 1 ∑𝑁 ( 𝑦 − 𝑦 ̂)2 𝑁 i=1 i where, 𝑦 ̂ i𝑠 𝑝𝑟e𝑑i𝑐𝑡e𝑑 𝑣𝑎𝑙𝑢e of 𝑦 𝑦 i𝑠 𝑚e𝑎𝑛 𝑣𝑎𝑙𝑢e of 𝑦 RSME Error: It tells the error rate by the square root of the L2 loss i.e. MSE. The formula of RSME is shown below. 𝑅𝑆𝑀𝐸 = √𝑀𝑆𝐸 = √ (𝑦 − 𝑦 ̂)2 𝑁 i=1 i Where, 𝑦 ̂ i𝑠 𝑝𝑟e𝑑i𝑐𝑡e𝑑 𝑣𝑎𝑙𝑢e of 𝑦 𝑦 i𝑠 𝑚e𝑎𝑛 𝑣𝑎𝑙𝑢e of 𝑦 R-squared error: It tells the good fit of the model-predicted line with the actual values of data. The coefficient value range is from 0 to 1 i.e. the value close to 1 is a well-fitted line. The formula is shown below. 𝑅2 = 1 − ∑(𝑦i−𝑦 ̂)2 ∑(𝑦i−𝑦)2 Where, 𝑦 ̂ i𝑠 𝑝𝑟e𝑑i𝑐𝑡e𝑑 𝑣𝑎𝑙𝑢e of 𝑦 KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 11
  • 30. lOMoAR cPSD|24598226 CAR PRICE PREDICTION Downloaded by Rakesh Swain (srakeshswain005@gmail.com) 𝑦 i𝑠 𝑚e𝑎𝑛 𝑣𝑎𝑙𝑢e of 𝑦 Note: In the case of an outlier, we can use L1 losses because with L2 loss the error is being squared to give more loss value. We can remove the outlier from the first and then can use L2 loss. Learning Rate: The alpha is the learning rate in the gradient descent formula as we seen above. It functions of the alpha to control the speed of the gradient descent to get the minima point. The value of alpha should be optimal so that it won’t miss the minima point or take time to reach the minima point. ∂𝐿 𝜃𝑛ew = 𝜃o𝑙𝑑 − 𝛼 o𝑙𝑑 1.4.2 Gradient Descent: To update θ1 and θ2 values in order to reduce Cost function (minimizing RMSE value) and achieving the best fit line the model uses Gradient Descent. The idea is to start with random θ1 and θ2 values and then iteratively updating the values, reaching minimum cost. 1.4.3 One Hot Encoding: Most Machine Learning algorithms cannot work with categorical data and needs to be converted into numerical data. Sometimes in datasets, we encounter columns that contain categorical features (string values) for example parameter Gender will have categorical parameters like Male, Female. These labels have no specific order of preference and also since the data is string labels, machine learning models misinterpreted that there is some sort of hierarchy in them. One approach to solve this problem can be label encoding where we will assign a numerical value to these labels for example Male and Female mapped to 0 and 1. But this can add bias in our model as it will start giving higher preference to the Female parameter as 1>0 and ideally both labels are equally important in the dataset. To deal with this issue we will use One Hot Encoding technique. In this technique, the categorical parameters will prepare separate columns for both Male KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 12 ∂𝜃
  • 31. lOMoAR cPSD|24598226 CAR PRICE PREDICTION Downloaded by Rakesh Swain (srakeshswain005@gmail.com) and Female labels. So, wherever there is Male, the value will be 1 in Male column and 0 in Female column, and vice-versa. Let’s understand with an example: Consider the data where fruits and their corresponding categorical values and prices are given. 1.5 Objective & Problem Statement Objective Of the Project - The goal of this project is to create an efficient and effective model that will be able to predict the price of a used car by using the Linear Regression algorithm with better accuracy.  Brand or Type of the car one prefers like Ford, Hyundai  Model of the car namely Ford Figo, Hyundai Creta  Year of manufacturing like 2020, 2021  Type of fuel namely Petrol, Diesel  Number of kilometers car has travelled Problem Statement - It is easy for any company to price their new cars based on the manufacturing and marketing cost it involves. But when it comes to a used car it is quite difficult to define a price because it involves it is influenced by various parameters like car brand, manufactured year and etc. The goal of our project is to predict the best price for a pre-owned car in the Indian market based on the previous data related to sold cars using Linear Regression. 1.6 Purpose of Project The used car market is an ever-rising industry, which has almost doubled its market value in the last few years. The emergence of online portals such as CarDheko, Quikr, Carwale, Cars24, and many others has facilitated the need for both the customer and the seller to be better informed about the trends and patterns that determine the value of the used car in the market. Machine Learning algorithms can be used to predict the retail value of a car, based on a certain set of features. The purpose of this project is to provide Car price prediction using machine learning without any human interference. In our day to day lives everyone buys and sells a car every day. Now there are limited facilities and applications to get an appropriate price for one’s car. Now we use KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 13
  • 32. lOMoAR cPSD|24598226 CAR PRICE PREDICTION Downloaded by Rakesh Swain (srakeshswain005@gmail.com) this application to get an estimate value of the car. 1.7 Architecture Diagram Fig 1.7 – Architecture of Linear Regression (M.L) 1.8 Project Goal We are required to model the price of cars with the available independent variables. It will be used by the management to understand how exactly the prices vary with the independent variables. They can accordingly manipulate the design of the cars, the business strategy etc. to meet certain price levels. Further, the model will be a good way for management to understand the pricing dynamics of a new market. KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 14
  • 33. lOMoAR cPSD|24598226 CAR PRICE PREDICTION Downloaded by Rakesh Swain (srakeshswain005@gmail.com) KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 15 CHAPTER -2
  • 34. lOMoAR cPSD|24598226 CAR PRICE PREDICTION Downloaded by Rakesh Swain (srakeshswain005@gmail.com) 2. SYSTEM REQUIREMENT SPECIFICATIONS 2.1What is SRS? Software Requirement Specification (SRS) is the starting point of the software developing activity. As system grew more complex it became evident that the goal of the entire system cannot be easily comprehended. Hence the need for the requirement phase arose. The software project is initiated by the client needs. The SRS is the means of translating the ideas of the minds of clients (the input) into a formal document (theoutput of the requirement phase.) The SRS phase consists of two basic activities: Problem/Requirement Analysis: The process is order and more nebulous of the two, deals with understand the problem,the goal and constraints. Requirement Specification: Here, the focus is on specifying what has been found giving analysis such as representation, specification languages and tools, and checking the specifications are addressed during this activity. The Requirement phase terminates with the production of the validate SRS document. Producing the SRS document is the basic goal of this phase. 2.1.1 Role of SRS: The purpose of the Software Requirement Specification is to reduce the communication gap between the clients and the developers. Software Requirement Specification is the medium though which the client and user needs are accurately specified. It forms the basis of software development. A good SRS should satisfy all the parties involved in the system. 2.2Requirements Specification Document A Software Requirements Specification (SRS) is a document that describes the nature of a project, software or application. In simple words, SRS document is a manual of a project provided it is prepared before you kick-start a project/application. This KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 16
  • 35. lOMoAR cPSD|24598226 CAR PRICE PREDICTION Downloaded by Rakesh Swain (srakeshswain005@gmail.com) document is also known by the names SRS report, software document. A software document is primarily prepared for a project, software or any kind of application. There are a set of guidelines to be followed while preparing the software requirement specification document. This includes the purpose, scope, functional and non-functional requirements, software and hardware requirements of the project. In addition to this, it also contains the information about environmental conditions required, safety and security requirements, software quality attributes of the project etc. The purpose of SRS (Software Requirement Specification) document is to describethe external behavior of the application developed or software. It defines the operations, performance and interfaces and quality assurance requirement of the application or software. The complete software requirements for the system are captured by the SRS. This section introduces the requirement specification document for Car Price Prediction using linear Regression which enlists functional as well as non-functional requirements. 2.2 Functional Requirements For documenting the functional requirements, the set of functionalities supported by the system are to be specified. A function can be specified by identifying the state at which data is to be input to the system, its input data domain, the output domain, and the type of processing to be carried on the input data to obtain the output data. Functional requirements define specific behavior or function of the application. Following are the functional requirements: FR1) After Registration the details should store in MySQL. FR2) Entering Login details should show the user’s data . FR3) The login page should redirect to next page(home). FR4) The attributes should be shown after redirecting to home page. FR5) After Entering attributes the price prediction should be shown. 2.3 Non-Functional Requirements A non-functional requirement is a requirement that specifies criteria that can be used to judge the operation of a system, rather than specific behaviors. Especially these are KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 17
  • 36. lOMoAR cPSD|24598226 CAR PRICE PREDICTION Downloaded by Rakesh Swain (srakeshswain005@gmail.com) the constraints the system must work within. Following are the non-functional requirements: NFR 1) Must be able to work properly without bugs. NFR 2) Should not be any lag showing the price NFR 3) The database should access proper user data. NFR 4) Attributes must be displayed properly to user. 2.3.1 Performance: The performance of the developed applications can be calculated by using following methods: Measuring enables you to identify how the performance of your application stands in relation to your defined performance goals and helps you to identify the bottlenecks that affect your application performance. It helps you identify whether your application is moving toward or away from your performance goals. Defining what you will measure, that is, your metrics, and defining the objectives for each metric is a critical part of your testing plan. Performance objectives include the following: Response time, Latency throughput or Resource utilization. 2.4 Software Requirements Operating System : Windows 10/11 or MAC OS. Platform : Google colab, PyCharm IDE Programming Language : Python, SQL 2.5 Hardware Requirements Processor : Intel core i3 and above. Hard Disk : 1 TB or above. RAM : 4 GB or above. Internet : 1 Mbps or above (Wireless). KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 18
  • 37. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) What is SRS ? The process of testing a software in a well-planned and systematic way is known as software testing lifecycle (STLC). Different organizations have different phases in STLC however generic Software Test Life Cycle (STLC) for waterfall development model consists of the following phases: 1.Requirements Analysis 2.Test Planning 3.Test Analysis 4.Test Design 5.Test Construction and Verification 6.Test Execution and Bug Reporting 7.Final Testing and Implementation 8.Post Implementation 2.6 Requirements Analysis In this phase testers analyses the customer requirements and work with developers during the design phase to see which requirements are testable and how they are going to test those requirements. It is very important to start testing activities from the requirements phase itself because the cost of fixing defect is very less if it is found in requirements phase rather than in future phases. In this phase all the planning about testing is done like what needs to be tested, how the testing will be done, test strategy to be followed, what will be the test environment, what test methodologies will be followed, hardware and software availability, resources, risks etc. A high level test plan document is created which includes all the planning inputs mentioned above and circulated to the stakeholders. 2.7 Test Construction and Verification In this phase testers prepare more test cases by keeping in mind the positive and negative scenarios, end user scenarios etc. All the test cases and automation scripts need to be completed in this phase and got reviewed by the stakeholders. The test plan document should also be finalized and verified by reviewers. KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 19
  • 38. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) 2.8 Test Execution and Bug Reporting Once the unit testing is done by the developers and test team gets the test build, The test cases are executed and defects are reported in bug tracking tool, after the test execution is complete and all the defects are reported. Test execution reports are created and circulated to project stakeholders. After developers fix the bugs raised by testers theygive another build with fixes to testers, testers do re-testing and regression testing to ensure that the defect has been fixed and not affected any other areas of software. Testing is an iterative process i.e. If defect is found and fixed, testing needs to be done after every defect fix. After tester assures that defects have been fixed and no more critical defects remain in software the build is given for final testing. 2.9Final Testing and Implementation In this phase the final testing is done for the software, non-functional testing like stress, load and performance testing are performed in this phase. The software is also verified in the production kind of environment. Final test execution reports and documents are prepared in this phase. 2.10 Post Implementation In this phase the test environment is cleaned up and restored to default state, the process review meetings are done and lessons learnt are documented. A document is prepared to cope up similar problems in future releases. KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 20
  • 39. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) Phase Activities Outcome Planning Create high level test plan Test plan, Refined Specification Analysis Create detailed testplan, Functional Revised Test Plan, Functional Validation Matrix, test cases Validation Matrix, test cases Design Test cases are revised, select which test cases to automate Revised test cases, test data sets, risk assessment sheet. Construction Scripting of test cases to automate Test procedures/Scripts, Drivers, test results, Bug reports Testing cycles Complete testing cycles Test results, Bug reports Final testing Execute remainingstress and performancetests, complete documentation Test results and different metrics on test efforts Post implementation Evaluate testing processes Plan for improvement of testing process Table 3.7 – Activities and Outcomes of each phase in SDLC 2.11 Technologies Used: 2.11.1 Google Colab Colaboratory, or ―Colab‖ for short, is a product from Google Research. Colab allows anybody to write and execute arbitrary python code through the browser, and is especially well suited to machine learning, data analysis and education. More technically, Colab is a hosted Google colab service that requires no setup to use, while providing access free of charge to computing resources including GPUs. KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 21
  • 40. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) Is Google colab like Google colab? Google Colab's major differentiator from Google colab is that it is cloud-based and Jupyter is not. This means that if you work in Google Collab, you do not have to worry about downloading and installing anything to your hardware. Fig 3.8.1 – Google colab 2.11.2 PyCharm IDE PyCharm is a dedicated Python Integrated Development Environment (IDE) providing a wide range of essential tools for Python developers, tightly integrated to create a convenient environment for productive Python, web, and data science development. JetBrains s.r.o. (formerly IntelliJ Software s.r.o.) is a Czech software development company which makes tools for software developers and project managers. The company offers integrated development environments (IDEs) for the programming languages Java, Groovy, Kotlin, Ruby, Python, PHP, C, Objective-C, C++, C#, F#, Go, JavaScript, and the domain-specific language SQL. 2.11.3 SQL SQL (Structured Query Language) is a powerful and standard query language for relational database systems. We use SQL to perform CRUD (Create, Read, Update, Delete) operations on databases along with other various operations. SQL has evolved a lot in the past decade. KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 22
  • 41. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) utilities, or as parts of other applications. RDBMS RDBMS stands for Relational Database Management System. RDBMS is the basis for SQL, and for all modern database systems such as MS SQL Server, IBM DB2, Oracle, MySQL, and Microsoft Access. The data in RDBMS is stored in database objects called tables. A table is a collection of related data entries and it consists of columns and rows. Although SQL is an ANSI/ISO standard, there are different versions of the SQL language. However, to be compliant with the ANSI standard, they all support at least the major commands (such as SELECT, UPDATE, DELETE, INSERT, WHERE) in a similar manner. MySQL, the most popular Open Source SQL database management system, is developed, distributed, and supported by Oracle Corporation. MySQL is a database management system. A database is a structured collection of data. It may be anything from a simple shopping list to a picture gallery or the vast amounts of information in a corporate network. To add, access, and process data stored in a computer database, you need a database management system such as MySQL Server. Since computers are very good at handling large amounts of data, database management systems play a central role in computing, as standalone Using SQL in Your Web Site To build a web site that shows data from a database, you will need:  An RDBMS database program (i.e. MS Access, SQL Server, MySQL)  To use a server-side scripting language, like PHP or python  To use SQL to get the data you want  To use HTML / CSS to style the page 2.11.4 Flask Flask is a micro web framework written in Python. It is classified as a micro framework because it does not require particular tools or libraries. It has no database abstraction layer, form validation, or any other components where pre-existing third-party libraries provide common functions. KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 23
  • 42. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 24 CHAPTER -3
  • 43. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) 3. LITERATURE SURVEY 3.1 Paper work Over fitting and under fitting come into picture when we create our statistical models. The models might be too biased to the training data and might not perform well on the test dataset. This is called over fitting. Likewise, the models might not take into consideration all the variance present in the population and perform poorly on a test data set. This is called underfitting. A perfect balance needs to be achieved between these two, which leads to the concept of Bias-Variance tradeoff. Pierre Geurts has introduced and explained how bias-variance tradeoff is achieved in both regression and classification. The selection of variables/attribute plays a vital role in influencing both the bias and variance of the statistical model. Robert Tibshirani proposed a new method called Lasso, which minimizes the residual sum of squares. This returns a subset of attributes which need to be included in multiple regression to get the minimal error rate. Similarly, decision trees suffer from overfitting if they are not pruned/shrunk. Trevor Hastie and Daryl Pregibon have explained the concept of pruning in their research paper. Moreover, hypothesis testing using ANOVA is needed to verify whether the different groups of errors really differ from each other. This is explained by TK Kim and Tae Kyun in their paper. A Post-Hoc test needs to be performed along with ANOVA if the number of groups exceeds two. Turkey’s Test has been explored by Haynes W. in his research paper. Using these techniques, we will create, train and test the effectiveness of our statistical models. The paper is Predicting the price of Used Car Using Machine Learning Techniques. In this paper, they investigate the application of supervised machine learning techniques to predict the price of used cars in Mauritius. The predictions are based on historical data collected from daily newspapers. Different techniques like multiple linear regression analysis, k-nearest neighbors, naïve bayes and decision trees have been used to make the predictions. The paper is Car Price Prediction Using Machine Learning Techniques. Considerable number of distinct attributes is examined for the reliable and accurate prediction. To build a model for predicting the price of used cars in Bosnia and Herzegovina, they have applied three machine learning techniques (Artificial Neural Network, Support Vector Machine and Random Forest). KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 25
  • 44. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) The paper is Price Evaluation model in second hand car system based on BP neural networks. In this paper, the price evaluation model based on big data analysis is proposed, which takes advantage of widely circulated vehicle data and a large number of vehicle transaction data to analyze the price data for each type of vehicles by using the optimized BP neural network algorithm. It aims to established second-hand car price evaluation model to get the price that best matches the car. 3.2 PROPOSED MODEL Null Hypothesis Even though the magnitude of over fitting has been reduced, Regression trees still suffer from over fitting even after Pruning. This leads to our following hypothesis. Hypothesis: Multiple and Lasso Regressions are better at predicting price than the Regression Tree. Training and Testing Data The data is split into training (70% - 563 records) and testing (30% - 241 records) data sets through random sampling (seed was set to 2786). Linear Regression In statistics, linear regression is a linear approach for modelling the relationship between a scalar response and one or more explanatory variables (also known as dependent and independent variables). The case of one explanatory variable is called simple linear regression; for more than one, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable. KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 26
  • 45. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) 3.3 Related work Researchers more often predict prices of products using some previous data and so did Pudaruth who predicted prices of cars in Mauritius and these cars were new rather second hand. He used multiple linear regression, k-nearest neighbors, naïve Bayes and decision trees algorithm in order to predict the prices. The comparison of prediction results from these techniques showed that the prices from these methods are closely comparable. However, it was found that decision tree algorithm and naïve bayes method were unable to classify and predict numeric values. Pudaruth’s research also concluded that limited number of instances in data set do not offer high prediction accuracies. Multivariate regression model helps in classifying and predicting values of numeric format. Kuiper used this model to predict price of 2005 General Motor (GM) cars. The price prediction of cars does not require any special knowledge so the data available online is enough to predict prices like the data available on www.pakwheels.com. Kuiper did the same i.e. car price prediction and introduced variable selection techniques which helped in finding which variables are more relevant for inclusion in model. He encouraged students to use different models and find how checking model assumptions work. Another similar research by Listiani uses Support Vector Machines (SVM) to predict the prices of leased cars. This research showed that SVM is far more accurate in predicting prices as compared to the multiple linear regression when a very large dataset is available. SVM also handles high dimensional data better and avoids both the under- fitting and over-fitting issues. Genetic algorithm is used by Listiani to find important features for SVM. However, the technique does not show in terms of variance and mean standard deviation why SVM is better than simple multiple regression. KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 27
  • 46. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) CHAPTER -4 KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 28
  • 47. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) 4. SYSTEM DESIGN 4.1 Introduction to UML The Unified Modeling Language allows the software engineer to express an analysis model using the modeling notation that is governed by a set of syntactic, semantic and pragmatic rules. A UML system is represented using five different views that describe the system from distinctly different perspective. Each view is defined by a set of diagram, which is as follows: 1. User Model View This view represents the system from the users’ perspective. The analysis representation describes a usage scenario from the end-users’ perspective. 2. Structural Model View In this model, the data and functionality are arrived from inside the system. This model view models the static structures. 3. Behavioral Model View It represents the dynamic of behavioral as parts of the system, depicting he interactions of collection between various structural elements described in the user model and structural model view. 4. Implementation Model View In this view, the structural and behavioral as parts of the system are represented as they are to be built. 5. Environmental Model View In this view, the structural and behavioral aspects of the environment in which the system is to be implemented are represented. 4.2 UML Diagrams 4.2.1 Use Case Diagram To model a system, the most important aspect is to capture the dynamic behavior. To clarify a bit in details, dynamic behavior means the behavior of the system when it is running/operating. KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 29
  • 48. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) So only static behavior is not sufficient to model a system rather dynamic behavior is more important than static behavior. In UML there are five diagrams available to modeldynamic nature and use case diagram is one of them. Now as we have to discuss that the use case diagram is dynamic in nature there should be some internal or external factors for making the interaction. These internal and external agents are known as actors. So use case diagrams are consisting of actors, use cases and their relationships. The diagram is used to model the system/subsystem of an application. A single use case diagram captures a particular functionality of a system. So to model the entire system numbers of use case diagramsare used. Use case diagrams are used to gather the requirements of a system including internal and external influences. These requirements are mostly design requirements. So when a system is analysed to gather its functionalities use cases are prepared and actors are identified. In brief, the purposes of use case diagrams can be as follows: a. Used to gather requirements of a system. b. Used to get an outside view of a system. c. Identify external and internal factors influencing the system. d. Show the interacting among the requirements are actors. Fig 4.2.1 – Use Case Diagram KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 30
  • 49. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) 4.2.2 Sequence Diagram Sequence diagrams describe interactions among classes in terms of an exchange of messages over time. They're also called event diagrams. A sequence diagram is a good way to visualize and validate various runtime scenarios. These can help to predict how a system will behave and to discover responsibilities a class may need to have in the process of modelling a new system. The aim of a sequence diagram is to define event sequences, which would have a desired outcome. The focus is more on the order in which messages occur than on the message per se. However, the majority of sequence diagrams will communicate what messages are sent and the order in which they tend to occur. Basic Sequence Diagram NotationsClass Roles or Participants Class roles describe the way an object will behave in context. Use the UML object symbol to illustrate class roles, but don't list object attributes. Activation or Execution Occurrence Activation boxes represent the time an object needs to complete a task. When an object is busy executing a process or waiting for a reply message, use a thin grey rectangle placed vertically on its lifeline. KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 31
  • 50. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) Fig 4.2.2 – Sequence Diagram KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 32
  • 51. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) Each class is represented by a rectangle having a subdivision of three compartments name, attributes and operation. 4.2.3 Class Diagram Class diagrams are the main building blocks of every object oriented methods. The class diagram can be used to show the classes, relationships, interface, association, and collaboration. UML is standardized in class diagrams. Since classes are the building block of an application that is based on OOPs, so as the class diagram has appropriate structure to represent the classes, inheritance, relationships, and everything that OOPs have in its context. It describes various kinds of objects and the static relationship in between them. The main purpose to use class diagrams are: 1. This is the only UML which can appropriately depict various aspects of OOPsconcept. 2. Proper design and analysis of application can be faster and efficient. 3. It is base for deployment and component diagram. Figure 4.2.3 Class Diagram KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 33
  • 52. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) 4.2.4 System Design A software module is the lowest level of design granularity in the system. Depending on the software development approach, there may be one or more modules per system. This section should provide enough detailed information about logic and data necessary to completely write source code for all modules in the system (and/or integrate COTS software programs). If there are many modules or if the module documentation is extensive, place it in an appendix or reference a separate document. Add additional diagrams and information, if necessary, to describe each module, its functionality, and its hierarchy. Industry-standard module specification practices should be followed. Include the following information in the detailed module designs:  A narrative description of each module, its function(s), the conditions under which it is used (called or scheduled for execution), its overall processing, logic, interfaces to other modules, interfaces to external systems, security requirements, etc.; explain any algorithms used by the module in detail  For COTS packages, specify any call routines or bridging programs to integrate the package with the system and/or other COTS packages (for example, Dynamic Link Libraries)  Data elements, record structures, and file structures associated with module input and output  Graphical representation of the module processing, logic, flow of control, and algorithms, using an accepted diagramming approach (for example, structure charts, action diagrams, flowcharts, etc.) KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 34
  • 53. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com)  Data entry and data output graphics; define or reference associated data elements; if the project is large and complex or if the detailed module designs will be incorporated into a separate document, then it may be appropriate to repeat the screen information in this section  Report layout Figure 4.2.4 System Design KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 35
  • 54. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) 4.2.5 State Chart Diagram The name of the diagram itself clarifies the purpose of the diagram and other details. It describes different states of a component in a system. The states are specific to a component/object of a system. A Statechart diagram describes a state machine. State machine can be defined as a machine which defines different states of an object and these states are controlled by external or internal events. Activity diagram explained in the next chapter, is a special kind of a Statechart diagram. As Statechart diagram defines the states, it is used to model the lifetime of an object. 4.2.5.1 How to Draw a Statechart Diagram? Statechart diagram is used to describe the states of different objects in its life cycle. Emphasis is placed on the state changes upon some internal or external events. These states of objects are important to analyze and implement them accurately. Statechart diagrams are very important for describing the states. States can be identified as the condition of objects when a particular event occurs. Before drawing a Statechart diagram we should clarify the following points −  Identify the important objects to be analyzed.  Identify the states.  Identify the events. Following is an example of a Statechart diagram where the state of Order object is analyzed The first state is an idle state from where the process starts. The next states are arrived for events like send request, confirm request, and dispatch order. These events are responsible for the state changes of order object. KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 36
  • 55. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) During the life cycle of an object (here order object) it goes through the following states and there may be some abnormal exits. This abnormal exit may occur due to some problem in the system. When the entire life cycle is complete, it is considered as a complete transaction as shown in the following figure. The initial and final state of an object is also shown in the following figure. Figure 4.2.5 Sate Chart Diagram KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 37
  • 56. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 38 CHAPTER -5
  • 57. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) 5. IMPLEMENTATION 5.1 Pseudo Code Step 1: Import the required packages. Step 2: Download the dataset and link it to the google colab. Step 3: Read the dataset and perform operations on data. Step 4: Data cleaning. Step 5: Data Preprocessing. Step 6: Saving the cleaned car data set after performing operations on data. Step 7: Start training the Machine learning Model. Step 8: Split features and target as x and y respectively. Step 9: Split the new data into 80% of Training data and 20% of Testing data. Step 10: Train the model with Training data and Testing data. Step 11: Implementing one hot encoder and column transformer to model. Step 12: Applying Linear Regression to the model. Step 13: Fit the Linear Regression Model. Step 14: If accuracy is good use the model for prediction else fit the model again, using other random states. Step 15: Dump the Linear Regression model into our files using pickle . Step 16: Open Pycharm and extract the cleaned car.csv and LinearRegressionModel.pkl files into our project. Step 17: Reading the model and dataset, make the prediction using python and flask from webpage. KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 39
  • 58. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) 5.2 Google Collab Data set Implementation: import pandas as pd car=pd.read_csv("https://raw.githubusercontent.com/rajtilakls2510/car_price_predictor/m aster/quikr_car.csv") car.shape (892, 6) car.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 892 entries, 0 to 891 Data columns (total 6 columns): # Column Non-Null Count Dtype 0 name 892 non-null object 1 company 892 non-null object 2 year 892 non-null object 3 Price 892 non-null object 4 kms_driven 840 non-null object 5 fuel_type 837 non-null object dtypes: object(6) memory usage: 41.9+ KB car['year'].unique() KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 40
  • 59. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) array(['2007', '2006', '2018', '2014', '2015', '2012', '2013', '2016', '2010', '2017', '2008', '2011', '2019', '2009', '2005', '2000', '...', '150k', 'TOUR', '2003', 'r 15', '2004', 'Zest', '/-Rs', 'sale', '1995', 'ara)', '2002', 'SELL', '2001', 'tion', 'odel', '2 bs', 'arry', 'Eon', 'o...', 'ture', 'emi', 'car', 'able', 'no.', 'd...', 'SALE', 'digo', 'sell', 'd Ex', 'n...', 'e...', 'D...', ', Ac', 'go .', 'k...', 'o c4', 'zire', 'cent', 'Sumo', 'cab', 't xe', 'EV2', 'r...', 'zest'], dtype=object) car['Price'].unique() array(['80,000', '4,25,000', 'Ask For Price', '3,25,000', '5,75,000', '1,75,000', '1,90,000', '8,30,000', '2,50,000', '1,82,000', '3,15,000', '4,15,000', '3,20,000', '10,00,000', '5,00,000', '3,50,000', '1,60,000', '3,10,000', '75,000', '1,00,000', '2,90,000', '95,000', '1,80,000', '3,85,000', '1,05,000', '6,50,000', '6,89,999', '4,48,000', '5,49,000', '5,01,000', '4,89,999', '2,80,000', '3,49,999', '2,84,999', '3,45,000', '4,99,999', '2,35,000', '2,49,999', '14,75,000', '3,95,000', '2,20,000', '1,70,000', '85,000', '2,00,000', '5,70,000', '1,10,000', '4,48,999', '18,91,111', '1,59,500', '3,44,999', '4,49,999', '8,65,000', '6,99,000', '3,75,000', '2,24,999', '12,00,000', '1,95,000', '3,51,000', '2,40,000', '90,000', '1,55,000', '6,00,000', '1,89,500', '2,10,000', '3,90,000', '1,35,000', '16,00,000', '7,01,000', '2,65,000', '5,25,000', '3,72,000', '6,35,000', '5,50,000', '4,85,000', '3,29,500', '2,51,111', '5,69,999', '69,999', '2,99,999', '3,99,999', '4,50,000', '2,70,000', '1,58,400', '1,79,000', '1,25,000', '2,99,000', '1,50,000', '2,75,000', '2,85,000', '3,40,000', '70,000', '2,89,999', '8,49,999', '7,49,999', '2,74,999', '9,84,999', '5,99,999', '2,44,999', '4,74,999', '2,45,000', '1,69,500', '3,70,000', '1,68,000', '1,45,000', '98,500', '2,09,000', '1,85,000', '9,00,000', '6,99,999', '1,99,999', '5,44,999', '1,99,000', '5,40,000', '49,000', '7,00,000', '55,000', '8,95,000', '3,55,000', '5,65,000', '3,65,000', '40,000', '4,00,000', '3,30,000', '5,80,000', '3,79,000', '2,19,000', '5,19,000', '7,30,000', '20,00,000', '21,00,000', '14,00,000', KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 41
  • 60. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) '3,11,000', '8,55,000', '5,35,000', '1,78,000', '3,00,000', '2,55,000', '5,49,999', '3,80,000', '57,000', '4,10,000', '2,25,000', '1,20,000', '59,000', '5,99,000', '6,75,000', '72,500', '6,10,000', '2,30,000', '5,20,000', '5,24,999', '4,24,999', '6,44,999', '5,84,999', '7,99,999', '4,44,999', '6,49,999', '9,44,999', '5,74,999', '3,74,999', '1,30,000', '4,01,000', '13,50,000', '1,74,999', '2,39,999', '99,999', '3,24,999', '10,74,999', '11,30,000', '1,49,000', '7,70,000', '30,000', '3,35,000', '3,99,000', '65,000', '1,69,999', '1,65,000', '5,60,000', '9,50,000', '7,15,000', '45,000', '9,40,000', '1,55,555', '15,00,000', '4,95,000', '8,00,000', '12,99,000', '5,30,000', '14,99,000', '32,000', '4,05,000', '7,60,000', '7,50,000', '4,19,000', '1,40,000', '15,40,000', '1,23,000', '4,98,000', '4,80,000', '4,88,000', '15,25,000', '5,48,900', '7,25,000', '99,000', '52,000', '28,00,000', '4,99,000', '3,81,000', '2,78,000', '6,90,000', '2,60,000', '90,001', '1,15,000', '15,99,000', '1,59,000', '51,999', '2,15,000', '35,000', '11,50,000', '2,69,000', '60,000', '4,30,000', '85,00,003', '4,01,919', '4,90,000', '4,24,000', '2,05,000', '5,49,900', '3,71,500', '4,35,000', '1,89,700', '3,89,700', '3,60,000', '2,95,000', '1,14,990', '10,65,000', '4,70,000', '48,000', '1,88,000', '4,65,000', '1,79,999', '21,90,000', '23,90,000', '10,75,000', '4,75,000', '10,25,000', '6,15,000', '19,00,000', '14,90,000', '15,10,000', '18,50,000', '7,90,000', '17,25,000', '12,25,000', '68,000', '9,70,000', '31,00,000', '8,99,000', '88,000', '53,000', '5,68,500', '71,000', '5,90,000', '7,95,000', '42,000', '1,89,000', '1,62,000', '35,999', '29,00,000', '39,999', '50,500', '5,10,000', '8,60,000', '5,00,001'], dtype=object) car['kms_driven'].unique() array(['45,000 kms', '40 kms', '22,000 kms', '28,000 kms', '36,000 kms', '59,000 kms', '41,000 kms', '25,000 kms', '24,530 kms', '60,000 kms', '30,000 kms', '32,000 kms', '48,660 kms', '4,000 kms', '16,934 kms', '43,000 kms', '35,550 kms', KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 42
  • 61. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) '39,522 kms', '39,000 kms', '55,000 kms', '72,000 kms', '15,975 kms', '70,000 kms', '23,452 kms', '35,522 kms', '48,508 kms', '15,487 kms', '82,000 kms', '20,000 kms', '68,000 kms', '38,000 kms', '27,000 kms', '33,000 kms', '46,000 kms', '16,000 kms', '47,000 kms', '35,000 kms', '30,874 kms', '15,000 kms', '29,685 kms', '1,30,000 kms', '19,000 kms', nan, '54,000 kms', '13,000 kms', '38,200 kms', '50,000 kms', '13,500 kms', '3,600 kms', '45,863 kms', '60,500 kms', '12,500 kms', '18,000 kms', '13,349 kms', '29,000 kms', '44,000 kms', '42,000 kms', '14,000 kms', '49,000 kms', '36,200 kms', '51,000 kms', '1,04,000 kms', '33,333 kms', '33,600 kms', '5,600 kms', '7,500 kms', '26,000 kms', '24,330 kms', '65,480 kms', '28,028 kms', '2,00,000 kms', '99,000 kms', '2,800 kms', '21,000 kms', '11,000 kms', '66,000 kms', '3,000 kms', '7,000 kms', '38,500 kms', '37,200 kms', '43,200 kms', '24,800 kms', '45,872 kms', '40,000 kms', '11,400 kms', '97,200 kms', '52,000 kms', '31,000 kms', '1,75,430 kms', '37,000 kms', '65,000 kms', '3,350 kms', '75,000 kms', '62,000 kms', '73,000 kms', '2,200 kms', '54,870 kms', '34,580 kms', '97,000 kms', '60 kms', '80,200 kms', '3,200 kms', '0,000 kms', '5,000 kms', '588 kms', '71,200 kms', '1,75,400 kms', '9,300 kms', '56,758 kms', '10,000 kms', '56,450 kms', '56,000 kms', '32,700 kms', '9,000 kms', '73 kms', '1,60,000 kms', '84,000 kms', '58,559 kms', '57,000 kms', '1,70,000 kms', '80,000 kms', '6,821 kms', '23,000 kms', '34,000 kms', '1,800 kms', '4,00,000 kms', '48,000 kms', '90,000 kms', '12,000 kms', '69,900 kms', '1,66,000 kms', '122 kms', '0 kms', '24,000 kms', '36,469 kms', '7,800 kms', '24,695 kms', '15,141 kms', '59,910 kms', '1,00,000 kms', '4,500 kms', '1,29,000 kms', '300 kms', '1,31,000 kms', '1,11,111 kms', '59,466 kms', '25,500 kms', '44,005 kms', '2,110 kms', '43,222 kms', '1,00,200 kms', '65 kms', '1,40,000 kms', '1,03,553 kms', '58,000 kms', '1,20,000 kms', '49,800 kms', '100 kms', '81,876 kms', '6,020 kms', '55,700 kms', '18,500 kms', '1,80,000 kms', '53,000 kms', '35,500 kms', KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 43
  • 62. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) '22,134 kms', '1,000 kms', '8,500 kms', '87,000 kms', '6,000 kms', '15,574 kms', '8,000 kms', '55,800 kms', '56,400 kms', '72,160 kms', '11,500 kms', '1,33,000 kms', '2,000 kms', '88,000 kms', '65,422 kms', '1,17,000 kms', '1,50,000 kms', '10,750 kms', '6,800 kms', '5 kms', '9,800 kms', '57,923 kms', '30,201 kms', '6,200 kms', '37,518 kms', '24,652 kms', '383 kms', '95,000 kms', '3,528 kms', '52,500 kms', '47,900 kms', '52,800 kms', '1,95,000 kms', '48,008 kms', '48,247 kms', '9,400 kms', '64,000 kms', '2,137 kms', '10,544 kms', '49,500 kms', '1,47,000 kms', '90,001 kms', '48,006 kms', '74,000 kms', '85,000 kms', '29,500 kms', '39,700 kms', '67,000 kms', '19,336 kms', '60,105 kms', '45,933 kms', '1,02,563 kms', '28,600 kms', '41,800 kms', '1,16,000 kms', '42,590 kms', '7,400 kms', '54,500 kms', '76,000 kms', '00 kms', '11,523 kms', '38,600 kms', '95,500 kms', '37,458 kms', '85,960 kms', '12,516 kms', '30,600 kms', '2,550 kms', '62,500 kms', '69,000 kms', '28,400 kms', '68,485 kms', '3,500 kms', '85,455 kms', '63,000 kms', '1,600 kms', '77,000 kms', '26,500 kms', '2,875 kms', '13,900 kms', '1,500 kms', '2,450 kms', '1,625 kms', '33,400 kms', '60,123 kms', '38,900 kms', '1,37,495 kms', '91,200 kms', '1,46,000 kms', '1,00,800 kms', '2,100 kms', '2,500 kms', '1,32,000 kms', 'Petrol'], dtype=object) car['fuel_type'].unique() array(['Petrol', 'Diesel', nan, 'LPG'], dtype=object) backup=car.copy() car=car[car['year'].str.isnumeric()] car['year']=car['year'].astype(int) car.info() <class 'pandas.core.frame.DataFrame'> Int64Index: 842 entries, 0 to 891 KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 44
  • 63. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) Data columns (total 6 columns): # Column Non-Null Count Dtype 0 name 842 non-null object 1 company 842 non-null object 2 year 842 non-null int32 3 Price 842 non-null object 4 kms_driven 840 non-null object 5 fuel_type 837 non-null object dtypes: int32(1), object(5) memory usage: 42.8+ KB car=car[car['Price'] != "Ask For Price"] car['Price']=car['Price'].str.replace(',','').astype(int) car['kms_driven']=car['kms_driven'].str.split(' ').str.get(0).str.replace(',','') car=car[car['kms_driven'].str.isnumeric()] car['kms_driven']=car['kms_driven'].astype(int) car=car[~car['fuel_type'].isna()] car['name']=car['name'].str.split(' ').str.slice(0,3).str.join(' ') car=car.reset_index(drop=True) car=car[car['Price']<6e6].reset_index(drop=True) car.to_csv('cleaned car.csv') #Splitting the features and target x=car.drop(columns='Price') y=car['Price'] KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 45
  • 64. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) x name company year kms_driven fuel_type 0 Hyundai Santro Xing Hyundai 2007 45000 Petrol 1 Mahindra Jeep CL550 Mahindra 2006 40 Diesel 2 Hyundai Grand i10 Hyundai 2014 28000 Petrol 3 Ford EcoSport Titanium Ford 2014 36000 Diesel 4 Ford Figo Ford 2012 41000 Diesel ... ... ... ... ... ... 811 Maruti Suzuki Ritz Maruti 2011 50000 Petrol 812 Tata Indica V2Tata 2009 30000 Diesel 813 Toyota Corolla Altis Toyota 2009 132000 Petrol 814 Tata Zest XM Tata 2018 27000 Diesel 815 Mahindra Quanto C8 Mahindra 2013 40000 Diesel 816 rows × 5 columns from sklearn.model_selection import train_test_split x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2) from sklearn.linear_model import LinearRegression from sklearn.metrics import r2_score from sklearn.preprocessing import OneHotEncoder from sklearn.compose import make_column_transformer from sklearn.pipeline import make_pipeline ohe=OneHotEncoder() ohe.fit(x[['name','company','fuel_type']]) OneHotEncoder() ohe.categories_ [array(['Audi A3 Cabriolet', 'Audi A4 1.8', 'Audi A4 2.0', 'Audi A6 2.0', KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 46
  • 65. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) 'Audi A8', 'Audi Q3 2.0', 'Audi Q5 2.0', 'Audi Q7', 'BMW 3 Series', 'BMW 5 Series', 'BMW 7 Series', 'BMW X1', 'BMW X1 sDrive20d', 'BMW X1 xDrive20d', 'Chevrolet Beat', 'Chevrolet Beat Diesel', 'Chevrolet Beat LS', 'Chevrolet Beat LT', 'Chevrolet Beat PS', 'Chevrolet Cruze LTZ', 'Chevrolet Enjoy', 'Chevrolet Enjoy 1.4', 'Chevrolet Sail 1.2', 'Chevrolet Sail UVA', 'Chevrolet Spark', 'Chevrolet Spark 1.0', 'Chevrolet Spark LS', 'Chevrolet Spark LT', 'Chevrolet Tavera LS', 'Chevrolet Tavera Neo', 'Datsun GO T', 'Datsun Go Plus', 'Datsun Redi GO', 'Fiat Linea Emotion', 'Fiat Petra ELX', 'Fiat Punto Emotion', 'Force Motors Force', 'Force Motors One', 'Ford EcoSport', 'Ford EcoSport Ambiente', 'Ford EcoSport Titanium', 'Ford EcoSport Trend', 'Ford Endeavor 4x4', 'Ford Fiesta', 'Ford Fiesta SXi', 'Ford Figo', 'Ford Figo Diesel', 'Ford Figo Duratorq', 'Ford Figo Petrol', 'Ford Fusion 1.4', 'Ford Ikon 1.3', 'Ford Ikon 1.6', 'Hindustan Motors Ambassador', 'Honda Accord', 'Honda Amaze', 'Honda Amaze 1.2', 'Honda Amaze 1.5', 'Honda Brio', 'Honda Brio V', 'Honda Brio VX', 'Honda City', 'Honda City 1.5', 'Honda City SV', 'Honda City VX', 'Honda City ZX', 'Honda Jazz S', 'Honda Jazz VX', 'Honda Mobilio', 'Honda Mobilio S', 'Honda WR V', 'Hyundai Accent', 'Hyundai Accent Executive', 'Hyundai Accent GLE', 'Hyundai Accent GLX', 'Hyundai Creta', 'Hyundai Creta 1.6', 'Hyundai Elantra 1.8', 'Hyundai Elantra SX', 'Hyundai Elite i20', 'Hyundai Eon', 'Hyundai Eon D', 'Hyundai Eon Era', 'Hyundai Eon Magna', 'Hyundai Eon Sportz', 'Hyundai Fluidic Verna', 'Hyundai Getz', 'Hyundai Getz GLE', 'Hyundai Getz Prime', 'Hyundai Grand i10', 'Hyundai Santro', 'Hyundai Santro AE', 'Hyundai Santro Xing', 'Hyundai Sonata Transform', 'Hyundai Verna', 'Hyundai Verna 1.4', 'Hyundai Verna 1.6', 'Hyundai Verna Fluidic', 'Hyundai Verna Transform', 'Hyundai Verna VGT', 'Hyundai Xcent Base', 'Hyundai Xcent SX', 'Hyundai i10', 'Hyundai i10 Era', 'Hyundai i10 Magna', 'Hyundai i10 Sportz', 'Hyundai i20', 'Hyundai i20 Active', 'Hyundai i20 Asta', 'Hyundai i20 Magna', 'Hyundai i20 Select', 'Hyundai i20 Sportz', KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 47
  • 66. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) 'Jaguar XE XE', 'Jaguar XF 2.2', 'Jeep Wrangler Unlimited', 'Land Rover Freelander', 'Mahindra Bolero DI', 'Mahindra Bolero Power', 'Mahindra Bolero SLE', 'Mahindra Jeep CL550', 'Mahindra Jeep MM', 'Mahindra KUV100', 'Mahindra KUV100 K8', 'Mahindra Logan', 'Mahindra Logan Diesel', 'Mahindra Quanto C4', 'Mahindra Quanto C8', 'Mahindra Scorpio', 'Mahindra Scorpio 2.6', 'Mahindra Scorpio LX', 'Mahindra Scorpio S10', 'Mahindra Scorpio S4', 'Mahindra Scorpio SLE', 'Mahindra Scorpio SLX', 'Mahindra Scorpio VLX', 'Mahindra Scorpio Vlx', 'Mahindra Scorpio W', 'Mahindra TUV300 T4', 'Mahindra TUV300 T8', 'Mahindra Thar CRDe', 'Mahindra XUV500', 'Mahindra XUV500 W10', 'Mahindra XUV500 W6', 'Mahindra XUV500 W8', 'Mahindra Xylo D2', 'Mahindra Xylo E4', 'Mahindra Xylo E8', 'Maruti Suzuki 800', 'Maruti Suzuki A', 'Maruti Suzuki Alto', 'Maruti Suzuki Baleno', 'Maruti Suzuki Celerio', 'Maruti Suzuki Ciaz', 'Maruti Suzuki Dzire', 'Maruti Suzuki Eeco', 'Maruti Suzuki Ertiga', 'Maruti Suzuki Esteem', 'Maruti Suzuki Estilo', 'Maruti Suzuki Maruti', 'Maruti Suzuki Omni', 'Maruti Suzuki Ritz', 'Maruti Suzuki S', 'Maruti Suzuki SX4', 'Maruti Suzuki Stingray', 'Maruti Suzuki Swift', 'Maruti Suzuki Versa', 'Maruti Suzuki Vitara', 'Maruti Suzuki Wagon', 'Maruti Suzuki Zen', 'Mercedes Benz A', 'Mercedes Benz B', 'Mercedes Benz C', 'Mercedes Benz GLA', 'Mini Cooper S', 'Mitsubishi Lancer 1.8', 'Mitsubishi Pajero Sport', 'Nissan Micra XL', 'Nissan Micra XV', 'Nissan Sunny', 'Nissan Sunny XL', 'Nissan Terrano XL', 'Nissan X Trail', 'Renault Duster', 'Renault Duster 110', 'Renault Duster 110PS', 'Renault Duster 85', 'Renault Duster 85PS', 'Renault Duster RxL', 'Renault Kwid', 'Renault Kwid 1.0', 'Renault Kwid RXT', 'Renault Lodgy 85', 'Renault Scala RxL', 'Skoda Fabia', 'Skoda Fabia 1.2L', 'Skoda Fabia Classic', 'Skoda Laura', 'Skoda Octavia Classic', 'Skoda Rapid Elegance', 'Skoda Superb 1.8', 'Skoda Yeti Ambition', 'Tata Aria Pleasure', 'Tata Bolt XM', 'Tata Indica', 'Tata Indica V2', 'Tata Indica eV2', KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 48
  • 67. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) 'Tata Indigo CS', 'Tata Indigo LS', 'Tata Indigo LX', 'Tata Indigo Marina', 'Tata Indigo eCS', 'Tata Manza', 'Tata Manza Aqua', 'Tata Manza Aura', 'Tata Manza ELAN', 'Tata Nano', 'Tata Nano Cx', 'Tata Nano GenX', 'Tata Nano LX', 'Tata Nano Lx', 'Tata Sumo Gold', 'Tata Sumo Grande', 'Tata Sumo Victa', 'Tata Tiago Revotorq', 'Tata Tiago Revotron', 'Tata Tigor Revotron', 'Tata Venture EX', 'Tata Vista Quadrajet', 'Tata Zest Quadrajet', 'Tata Zest XE', 'Tata Zest XM', 'Toyota Corolla', 'Toyota Corolla Altis', 'Toyota Corolla H2', 'Toyota Etios', 'Toyota Etios G', 'Toyota Etios GD', 'Toyota Etios Liva', 'Toyota Fortuner', 'Toyota Fortuner 3.0', 'Toyota Innova 2.0', 'Toyota Innova 2.5', 'Toyota Qualis', 'Volkswagen Jetta Comfortline', 'Volkswagen Jetta Highline', 'Volkswagen Passat Diesel', 'Volkswagen Polo', 'Volkswagen Polo Comfortline', 'Volkswagen Polo Highline', 'Volkswagen Polo Highline1.2L', 'Volkswagen Polo Trendline', 'Volkswagen Vento Comfortline', 'Volkswagen Vento Highline', 'Volkswagen Vento Konekt', 'Volvo S80 Summum'], dtype=object), array(['Audi', 'BMW', 'Chevrolet', 'Datsun', 'Fiat', 'Force', 'Ford', 'Hindustan', 'Honda', 'Hyundai', 'Jaguar', 'Jeep', 'Land', 'Mahindra', 'Maruti', 'Mercedes', 'Mini', 'Mitsubishi', 'Nissan', 'Renault', 'Skoda', 'Tata', 'Toyota', 'Volkswagen', 'Volvo'], dtype=object), array(['Diesel', 'LPG', 'Petrol'], dtype=object)] column_trans=make_column_transformer((OneHotEncoder(categories=ohe.categories_), ['name','company','fuel_type']), remainder='passthrough') lr=LinearRegression() pipe=make_pipeline(column_trans,lr) pipe.fit(x_train,y_train) Pipeline(steps=[('columntransformer', ColumnTransformer(remainder='passthrough', transformers=[('onehotencoder', OneHotEncoder(categories=[array(['Audi A3 Cabriolet', 'Audi A4 1.8', 'Audi A4 2.0', 'Audi A6 2.0', KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 49
  • 68. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) 'Audi A8', 'Audi Q3 2.0', 'Audi Q5 2.0', 'Audi Q7', 'BMW 3 Series', 'BMW 5 Series', 'BMW 7 Series', 'BMW X1', 'BMW X1 sDrive20d', 'BMW X1 xDrive20d', 'Chevrolet Beat', 'Chevrolet Beat... array(['Audi', 'BMW', 'Chevrolet', 'Datsun', 'Fiat', 'Force', 'Ford', 'Hindustan', 'Honda', 'Hyundai', 'Jaguar', 'Jeep', 'Land', 'Mahindra', 'Maruti', 'Mercedes', 'Mini', 'Mitsubishi', 'Nissan', 'Renault', 'Skoda', 'Tata', 'Toyota', 'Volkswagen', 'Volvo'], dtype=object), array(['Diesel', 'LPG', 'Petrol'], dtype=object)]), ['name', 'company','fuel_type'])])), ('linearregression', LinearRegression())]) y_pred=pipe.predict(x_test) y_pred y_test 322 210000 204 500000 42 284999 606 500000 513 159000 ... 801 465000 711 200000 731 300000 757 150000 379 130000 Name: Price, Length: 164, dtype: int32 r2_score(y_test,y_pred) 0.6863234123258164 # checking for maximum r2_score scores=[] for i in range(1000): x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=i) lr=LinearRegression() KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 50
  • 69. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) pipe=make_pipeline(column_trans,lr) pipe.fit(x_train,y_train) y_pred=pipe.predict(x_test) scores.append(r2_score(y_test,y_pred)) import numpy as np np.argmax(scores) 906 scores[np.argmax(scores)] 0.7768125045875028 #Training the model using highest r2_score x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=np.argmax(scores)) lr=LinearRegression() pipe=make_pipeline(column_trans,lr) pipe.fit(x_train,y_train) y_pred=pipe.predict(x_test) r2_score(y_test,y_pred) 0.8456515104452564 #predicting the price by taking input features pipe.predict(pd.DataFrame([['Maruti Suzuki Swift','Maruti',2019,100,'Petrol']], columns=['name','company','year','kms_driven','fuel_type'])) #prediction array([459113.49353657] # dumping the LinearRegressionModel.pkl file using pickle for further development process import pickle pickle.dump(pipe,open('LinearRegressionModel.pkl','wb')) KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 51
  • 70. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) 5.3 Code Snippets 1. home.html <!doctype html> <html lang="en"> <head> <!-- Required meta tags --> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to- fit=no"> <link rel="stylesheet" href="static/css/style.css"> <!-- Bootstrap CSS --> <linkrel="stylesheet" href="https://cdn.jsdelivr.net/npm/bootstrap@4.1.3/dist/css/bootstrap.min.css" integrity="sha384- MCw98/SFnGE8fJT3GXwEOngsV7Zt27NXFoaoApmYm81iuXoPkFOJwJ8ERdknLPM O" crossorigin="anonymous"> <title>Car Price Predictor</title> </head> <body class="backgroundColor"> <div class=" d-flex flex-column flex-md-row align-items-center p-3 px-md-4 mb-3 navbar-light" style="background-color: #e0f2f1;"> <h5 class="my-0 mr-md-auto font-weight-normal"><b><h4>CAR PRICE PREDICTOR</h4></b></h5> <nav class="my-2 my-md-0 mr-md-3 "> <a class="p-2 text-dark" href="{{url_for('home')}}"><b>Home</b></a> </nav> <a class="btn btn-outline-primary" href="/logout">Log out</a> </div> <div class="container"> <div clas="row"> <div class="card mt-50" style="width:100%;height:100%"> <div class="card-header"> <div class="col-12" style="text-align:center"> <h1>Welcome to Car Price Predictor</h1> KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 52
  • 71. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) </div> </div> <div class="card-body"> <form class="form" method="post" > <div class="col-10 form-group" style="text-align: center"> <label> <b>Select company: </b></label> <select class="selectpicker form-control" id="company" name="company" required="1" onchange="load_car_models(this.id,'car_model')"> {% for company in companies %} <option value="{{company}}">{{company}} </option> {% endfor %} </select> </div> <div class="col-10 form-group" style="text-align: center"> <label> <b>Select Model: </b></label> <select class="selectpicker form-control" id="car_model" name="car_model" required="1"> </select> </div> <div class="col-10 form-group" style="text-align: center"> <label> <b>Select Year of Purchase: </b></label> <select class="selectpicker form-control" id="year" name="year" required="1"> {% for year in years %} <option value="{{year}}">{{year}} </option> {% endfor %} </select> </div> <div class="col-10 form-group" style="text-align: center"> <label> <b>Select Fuel Type: </b></label> <select class="selectpicker form-control" id="fuel_type" name="fuel_type" required="1"> KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 53
  • 72. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) {% for fuel_type in fuel_types %} <option value="{{fuel_type}}">{{fuel_type}} </option> {% endfor %} </select> </div> <div class="col-10 form-group" style="text-align: center"> <label> <b>Kilometers travelled: </b></label> <input class="form-control" type="text" id="kms_driven" name="kms_driven" placeholder="Enter no.of kms travelled" > </input> </div> <div class="col-10 form-group" style="text-align: center"> <button class="btn btn-primary btn-block btn-lg" onclick="send_data()" value="Predict">Predict Price</button> </div> </form> <br> <div class="row"> <div class="col-12" style="text-align: center"> <h3><span id="prediction"></span> </h3> </div> </div> </div> </div> </div> </div> KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 54
  • 73. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) <script> function load_car_models(company_id,car_model_id) { var company= document.getElementById(company_id); var car_model= document.getElementById(car_model_id); car_model.value=""; car_model.innerHTML=""; {% for company in companies %} if(company.value == "{{company}}" ) { {% for model in car_models %} {% if company in model %} var newOption = document.createElement("option"); newOption.value="{{ model }}"; newOption.innerHTML="{{ model }}"; car_model.options.add(newOption); {% endif %} {% endfor %} } {% endfor %} } function form_handler() { event.preventDefault(); } function send_data() { document.querySelector('form').addEventListener('submit', form_handler); var fd= new FormData(document.querySelector('form')); KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 55
  • 74. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) var xhr=new XMLHttpRequest(); xhr.open('POST', '/predict', true); document.getElementById("prediction").innerHTML="wait! predicting price..."; xhr.onreadystatechange= function() { if(xhr.readyState == XMLHttpRequest.DONE) { document.getElementById("prediction").innerHTML="The Predicted Price is: "+ xhr.responseText + " Rs/-"; } } xhr.onload=function(){}; xhr.send(fd); } </script> <!-- Optional JavaScript --> <!-- jQuery first, then Popper.js, then Bootstrap JS --> <script src="https://code.jquery.com/jquery-3.3.1.slim.min.js" integrity="sha384- q8i/X+965DzO0rT7abK41JStQIAqVgRVzpbzo5smXKp4YfRvH+8abtTE1Pi6jizo" crossorigin="anonymous"></script> <script src="https://cdn.jsdelivr.net/npm/popper.js@1.14.3/dist/umd/popper.min.js" integrity="sha384- ZMP7rVo3mIykV+2+9J3UJ46jBk0WLaUAdn689aCwoqbBJiSnjAK/l8WvCWPIPm49" crossorigin="anonymous"></script> <script src="https://cdn.jsdelivr.net/npm/bootstrap@4.1.3/dist/js/bootstrap.min.js" integrity="sha384- ChfqqxuZUCnJSK3+MXmPNIyE6ZbWh2IMqE241rYiqJxyMiZ6OW/JmZQ5stwEULTy" crossorigin="anonymous"></script> </body> </html> KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 56
  • 75. lOMoAR cPSD|24598226 Car Price Prediction Downloaded by Rakesh Swain (srakeshswain005@gmail.com) 1. App.java import pandas as pd #from flask import Flask, render_template, request, url_for,redirect,session import pickle import numpy as np from flask import * import flask_login import os from num2words import num2words import mysql.connector model=pickle.load(open("LinearRegressionModel.pkl",'rb')) car=pd.read_csv("cleaned car.csv") app=Flask( name ) app.secret_key=os.urandom(24) conn=mysql.connector.connect( host='localhost', user='root', password='Password123@', port='3306', database='database' ) mycursor=conn.cursor() @app.route('/') def login(): if 'user_id' in session: return redirect('/home') else: return render_template('login.html') @app.route('/register') def register(): return render_template('register.html') KESHAV MEMORIAL INSTITUTE OF TECHNOLOGY Page 57