CS-422 THESIS (1).pptx

CS-422 THESIS/PROJECT
PRESENTATION
MID-TERM
TITLE-Mobile Price Prediction
And Classification Using ML
Submitted By-
Vikrant Rana(1000013732)
190184102
Aryaman Abrol(1000013790)
190184100
CSE(AI-DS)4th year 8th Semester
PG-134
Project Guide/Advisor-
Dr. Pooja Gupta(Assistant Professor)Masters Of
Computer Applications
Email - pooja.gupta@dit.university.edu.in
Submitted To-
Dr. Anuj Kumar Yadav
(Assistant Professor)Masters Of Computer
Applications

CONTENTS
 1.Project Overview
 2.Project Objective
 3.Project Introduction
 4.Methodology
 5.Result Summary
 6.Comprative Study
 7.Conclusion
 8.References

Project Overview
Mobile phones come in all sorts of prices,
features, specifications and all. Price
estimation and prediction is an important
part of consumer strategy. Deciding on
the correct price of a product is very
important for the market success of a
product. A new product that has to be
launched, must have the correct price so
that consumers find it appropriate to buy
the product.
This Project Will Help Users To Sort
mobile with given features and will know
if that product is Economical or Expensive

Project Objective
 To estimate "Whether the mobile with certain features will be Economical or Expensive"
is the major motive of this research work.
 Actual Dataset is gathered from www.kaggle.com website. Several feature selection
techniques are used to find and eliminate redundant
 and unimportant features with the least amount of computational complexity. To get
the highest accuracy possible, many classifiers are utilized.
 The maximum accuracy attained and the minimum features used are used to compare
results. The optimal feature selection algorithm and classifier
 for the given dataset are used to draw a conclusion. This research can be applied to any
sort of marketing or business to locate the best product
 (with the least amount of money spent and the most features). It is advised to continue
this research in the future in order to develop more
 sophisticated solutions to the problem at hand and more precise tools for pricing
estimation.

Project Introduction
The most successful marketing and commercial attribute is price. The pricing of the
things is the customer's very first query.
All of the customers are initially concerned and wonder if they will be able to buy the
item they are looking for.
So, the primary goal of the activity is to estimate prices at home. The first step towards
the goal indicated above is only taken by this study.
The scientific field of artificial intelligence, which enables machines to intelligently
respond to inquiries, has grown significantly in recent years.
The best artificial intelligence techniques, such as classification, regression, supervised
learning, unsupervised learning, and many others, are provided by machine learning.
For machine learning tasks, a variety of tools are available, including MATLAB, Python,
Cygwin, WEKA, and others. Any classifier, including Decision Tree, Naive Bayes, and
others, can be used.
To choose only the best characteristics and reduce the dataset, various feature selection
algorithms are available.

Project Introduction
The problem's computational complexity will decrease as a result. Many optimization
approaches are also utilized to lower the dataset's dimensionality because this is an
optimization problem.
The use of historical data to forecast the pricing of existing and newly launched products
is an intriguing area of research for machine learning experts.
Sameerchand-Pudaruth[1] has approximated Mauritius's used car prices. He developed a
number of methods to forecast prices, including multiple linear regressions,
k-nearest neighbours (KNN), decision trees, and Naive Bayes. With each of these methods,
Sameerchand-Pudaruth obtained findings that were equivalent.
Research has shown that the most popular algorithms, such as Decision Tree and Naive
Bayes, are unable to handle, categorise, and forecast numerical values.

Project Methodology
 The experiment is performed using WEKA (Waikato Environment for Knowledge
Analysis). The main steps of machine learning are as follows
Data Collection Dimesionality Classification
Data Collection-
Ten features of mobiles are collected from www.Kaggle.com i.e Category(
whether the given mobile is made by Apple, Samsung, Lenovo, NOKIA etc).
Memory card slot is considered as feature whether it is present or not. Size of
display(Inches), weight(g), Thickness(mm), Internal memory size(GB), Camera
Pixels(MP), Video Quality , RAM size(GB) and Battery (mAh) , all these attributes
have real values with following distinctions

Project Methodology
We will proceed with reading the data, and
then perform data analysis. The practice of
examining data using analytical or statistical
methods in order to identify meaningful
information is known as data analysis. After
data analysis, we will find out the data
distribution and data types. We will train 4
classification algorithms to predict the output.
We will also compare the outputs. Let us get
started with the project implementation.

Project Methodology
In Here, We can see Many Data Features
And Their Distributions.
Like – Battery Power
Bluetooth

Project Methodology
Libraries used are-
1.Numpy-NumPy is a library for the
Python programming language, adding
support for large, multi-dimensional
arrays and matrices, along with a large
collection of high-level mathematical
functions to operate on these arrays.
2.Pandas-Pandas is a software library
written for the Python programming
language for data manipulation and
analysis.
3.seaborn-Seaborn is a library for making
statistical graphics in Python. It builds on
top of matplotlib and integrates closely
with pandas data structures.

Project Methodology
4.Matplotlib- Matplotlib is a plotting library for the Python programming language
and its numerical mathematics extension NumPy.
5.Confusion Matrix-A confusion matrix is a table that is used to define the
performance of a classification algorithm. A confusion matrix visualizes and
summarizes the performance of a classification algorithm.

Project Methodology
Classification-
 Algorithms And Models We Used Are As Follows-
 Logistic regression
 Decision Tree
 Random Forest
 Support Vector Machines
 KNN
 Guassian NB

Project Methodology
 Random Forest Confusion Matrix-
 Output-

Project Methodology
 Guassian NB Confusion Matrix-

Project Methodology
Support Vector
Machine(SVM)-

Project Methodology
 Logistic Regression

Project Methodology
 Decision tree

Result Summary
 Now to summarize the work, all the results and their graphs are presented for
comparative study-

Result Summary
 From The Beginning We Selected 4 features for InfoGainAttributeEval Algorithm
and these are-
 Battery power
 M_dep
 Bluetooth
 Price range
 All These Attributes are classified by above algorithms and models and Their
accuracy
 The Highest Accuracy we Get is from the SVM confusion matrix which is 95%
 And least Accuracy Which Is Logistic Regression 64%

Comparative Study
 Comparison in machine learning is done in terms of Maximum accuracy and
minimum number of features selected. Maximum accuracy means more data
classified correctly. While minimum number of feature means minimum memory
required and reduced computation complexity
 Comparing the results maximum accuracy achieved is 95%, when
WrapperattributEval algorithm is used for feature selection and Decision tree as a
classifier. The features selected are only best two features (Display size and memory
in GB) out of ten. So the given is best combination for the given specific data.

CONCLUSION
 This work can be concluded with the comparable results of both Feature selection
algorithms and classifier except the combination of WrapperattributEval and
Descision Tree J48 classifier. This combination has achieved maximum accuracy and
selected minimum but most appropriate features. It is important to note that in
Forward selection by adding irrelevant or redundant features to the data set
decreases the efficiency of both classifiers. While in backward selection if we
remove any important feature from the data set, its efficiency decreases. The main
reason of low accuracy rate is low number of instances in the data set. One more
thing should also be considered while working that converting a regression
problem into classification problem introduces more error

OUTCOMES OF THE WORK
 Cost prediction is the very important factor of marketing and business. To predict
the cost same procedure can be performed for all types of products for example
Cars, Foods, Medicine, Laptops etc. Best marketing strategy is to find optimal
product (with minimum cost and maximum specifications). So products can be
compared in terms of their specifications, cost, manufacturing company etc. By
specifying economic range a good product can be suggested to a costumer

References
 [1] Sameerchand Pudaruth . “Predicting the Price of Used Cars using Machine
Learning Techniques”, International Journal of Information & Computation
Technology. ISSN 0974-2239 Volume 4, Number 7 (2014), pp. 753- 764
 [2] Shonda Kuiper, “Introduction to Multiple Regression: How Much Is Your Car
Worth? ” , Journal of Statistics Education · November 2008
 [3] Mariana Listiani , 2009. “Support Vector Regression Analysis for Price Prediction
in a Car Leasing Application”. Master Thesis. Hamburg University of Technology
 [4] Limsombunchai, V. 2004. “House Price Prediction: Hedonic Price Model vs.
Artificial Neural Network”, New Zealand Agricultural and Resource Economics
Society Conference, New Zealand, pp. 25-26. 2004
 [5] Kanwal Noor and Sadaqat Jan, “Vehicle Price Prediction System using Machine
Learning Techniques” , International Journal of Computer Applications (0975 –
8887) Volume 167 – No.9, June 2017.
 [6] Mobile data and specifications online available from
https://www.gsmarena.com/ (Last Accessed on Friday, December 22, 2017, 6:14:54
PM)

References
 [7] Introduction to dimensionality reduction, A computer science portal for Geeks.
https://www.geeksforgeeks.org/dimensionality-reduction/ (Last Accessed on
Monday , Jan 2018 22, 3 PM)
 [8] Ethem Alpaydın, 2004. Introduction to Machine Learning, Third Edition. The MIT
Press Cambridge, Massachusetts London, England
 [9] InfoGainAttributeEval-Weka Online available from
http://weka.WrapperattributEval/doc.dev/weka/attributeS
election/InfoGainAttributeEval.html (Last Accessed in Jan 2018 )
 [10] Thu Zar Phyu, Nyein Nyein Oo. Performance Comparison of Feature Selection
Methods. MATEC Web of Conferences42, (2016).
 Dataset Taken From www.Kaggle.com

CS-422 THESIS (1).pptx

Recommended

Recommended

More Related Content

Similar to CS-422 THESIS (1).pptx

Similar to CS-422 THESIS (1).pptx (20)

Recently uploaded

Recently uploaded (20)

CS-422 THESIS (1).pptx