Prediction of quality for different type of winebased on different feature sets using supervised machine learning techniques
1. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
Prediction of Quality for Different Type of Wine based on Different
Feature Sets Using Supervised Machine Learning Techniques
In this paper author is predicting quality of wine using supervise machine
learning algorithmssuch asSVM,Random Forest,NaïveBayes etc. All algorithms
prediction accuracy can be better by adding features selection algorithms such
as Genetic Algorithm (GA) or Simulated Annealing (SA). Feature selection
algorithms can be applied to dataset to remove non relevant attributes or
missing values and take only thoseattributes from dataset which are important
to makepredictions.Using featureselection algorithmswecan decreasedataset
size by removing non relevant data and make prediction accuracy better and
faster.
Genetic algorithm works in similar way as its work on chromosomes by taking
relevant genes to formnew production and remove unhealthy or non-relevant
genes. GA algorithm continuously iterate over dataset to look for non-relevant
attributes by doing mutation, reproduction and fitness, only those attributes
which has high fitness or related to more dataset values can be used for
mutation and reproduction and unfitted values will be removed out.
Simulated annealing (SA) is a global search/selection method that makes small
random changes (i.e. perturbations) to an initial (dataset values) candidate
solution. If the performancevalue for the perturbed (new Data) value is better
than the previous solution, the new solution (data/attribute) is accepted. If not,
an acceptance probability is determined based on the difference between the
two performance values and the current iteration of the search. From this, a
sub-optimal solution can be accepted on the off-change that it may eventually
produce a better solution or best attributes in subsequent iterations.
SVMAlgorithm: Machinelearning involvespredicting and classifyingdata and to
do so we employ various machinelearning algorithms according to the dataset.
SVM or Support Vector Machine is a linear model for classification and
regression problems. Itcan solve linear and non-linear problems and work well
for many practical problems. The idea of SVM is simple: The algorithm creates a
line or a hyperplanewhich separates the data into classes. In machine learning,
the radial basis function kernel, or RBF kernel, is a popular kernel function used
in various kernelized learning algorithms. In particular, it is commonly used in
support vector machine classification. As a simple example, for a classification
2. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
taskwith onlytwo features (likethe image above),youcan think of a hyperplane
as a line that linearly separates and classifies a set of data.
Intuitively, the further from the hyperplane our data points lie, the more
confident we are that they have been correctly classified. We therefore want
our data points to be as far away from the hyperplane as possible, while still
being on the correct side of it.
So when new testing data is added, whatever sideof the hyperplaneit lands will
decide the class that we assign to it.
How do we find the right hyperplane?
Or, in other words, how do we best segregate the two classes within the data?
The distancebetween the hyperplaneand the nearestdata point fromeither set
is known as the margin. The goal is to choose a hyperplane with the greatest
possible margin between the hyperplane and any point within the training set,
giving a greater chance of new data being classified correctly.
Random ForestAlgorithm: it’s an ensemble algorithm which means internally it
will use multiple classifier algorithms to build accurate classifier model.
Internally this algorithm will use decision tree algorithm to generate it train
model for classification.
Naive Bayes: Naive Bayes which is one of the most commonly used algorithms
for classifying problems is simple probabilistic classifier and is based on Bayes
Theorem. It determines the probability of each features occurring in each class
and returns the outcome with the highest probability.
Dataset Information
We downloaded wine dataset from UCI machine learning website and dataset
saved inside dataset folder. All machine learning algorithms will take dataset and
train a model by splitting dataset into train and test part. Train part will be used
to train model and test part will be applied on train model to predict test part class
value.
Screen shots
To run this project double click on ‘run.bat’ file to get below screen
3. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
In above screen click on ‘upload White/Red Wine Dataset’ button to upload red
or white wine dataset.
In above screen I am uploading redwine dataset and after upload will get below
screen
4. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
Now click on ‘Run SVM with GA’ button to run SVM algorithm with genetic
feature selection algorithm. After clicking on this button 5 empty windows will
open you just closed all 5 windows and keep the old one running
In above screen we got 60% accuracy for SVM with Ga. Now run SVM with SA
(Simulated Annealing) Algorithm
5. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
In abovescreenfor SVM with SA we got50 % accuracy. Nowrun RandomForest
with GA
With random forest ga we got 30% accuracy. Now run random forest with SA
6. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
In above screenrandom forestSA also got same accuracy and now click on Naïve
Bayes with GA
In above screennaïve bayes with GA got 40% accuracyand nowrun Naïve Bayes
with SA
7. Venkat Java Projects
Mobile:+91 9966499110
Visit:www.venkatjavaprojects.com Email:venkatjavaprojects@gmail.com
Naïve Bayes SA got 40 % accuracy. Now click on ‘Accuracy Graph’ button to
get accuracy graph for all algorithms
In above graph x-axis represents algorithm name and y-axis represents accuracy
of those algorithms and from above graph we can conclude SVM with GA got
better accuracy compare to all other algorithms