4. MOTIVATION
The product can be used in jam manufacturing
industries or similar factories where fruit
segregation is required to be done
automatically and no human intervention
should be needed.
5. SCOPE
The system built assumes that the metrics
used for prediction are available beforehand.
The scope currently is to classify the fruits but
not the sub types.
6. DATASET USED
The fruits dataset was created by Dr. Jain Murray from University of Edinburgh.
And then the professors at University of Michigan formatted the fruits data slightly.
7. FEATURES
The system uses three classification algorithms:
1. KNN
2. Naïve Bayes
3. Decision trees
The accuracy is also shown for each model.
The option for prediction of fruit label is provided for an
unknown dataset.
The system also has an interactive GUI.
LINK FOR SRS:
8. PRE PROCESSING
Scaling: The features in the dataset are scaled so
as to consider equal weight of all the attributes.
Because the value of height and weight attribute is
much higher than colour score.
Attribute Selection: The attributes not required like
the fruit subtype is removed. And redundancy of
attribute is removed by considering only one.
9. CROSS VALIDATION STRATEGY
We train the model using only 70% of the dataset
and remaining is reserved for testing the validity of
the model.
The splitting is done by randomly selecting some
sets to train the model. And remaining sets are used
for predicting using the model built.
10. ALGORITHMS USED
1. K nearest neighbour : The distance of new data point to all other
training data points is calculated and selects k nearest data points.
Finally, it assigns the data point to which majority of the k data points
belong.
11. 2. Naïve Bayes: Every pair of features is
classified independent of each other. It is a
probabilistic model based on Bayes’ theorem.
12. 2. Decision trees: Tries to solve problem using
tree representation and selects best attributes for upper
levels of the tree.
13. COMPARISON
KNN Naïve Bayes Decision Tree
Accuracy 0.9 0.65 0.8
Detail Memory based
technique
Considers all
attribute
independent
Follows SOP
representation