Image_Processing_based_ML_Framework_for_Leaf_Classification_and_Disease_Detection.pdf

Image Processing based ML Framework for Leaf
Classification and Disease Detection
Abhinav Venkatadri, Akhila Jagarlapudi, Pranjal Ranjan, Rizwan Ahmed Ansari
Department of Electrical Engineering
Veermata Jijabai Technological Institute
Mumbai, India
abhinavvenkatadri@ieee.org, akhila.j@ieee.org, pranjalranjan@ieee.org, rizwan.vjti@ieee.org
Abstract— Image processing techniques are employed in
this paper to supplement machine learning models for
categorizing leaf species and detecting unhealthy spots in them.
Multiple public datasets were mixed with a self-curated dataset
of leaves from a homemade garden for the same. A set of 17
distinct features are extracted from background-subtracted
leaf photos to capture information about texture, colour, and
shape for leaf classification. These characteristics are used to
train machine learning models such as Support Vector
Machine and K-Nearest Neighbors, which do not require
hardware intensive training. These models are trained on
several dataset combinations, including the merged one, and
then compared using precision, recall, and F1-score measures.
Keywords— Leaf Classification, Disease Segmentation, Otsu
Thresholding, Image Processing
I. INTRODUCTION
Agriculture is the most predominant occupation in India.
The native south-Asian plants are, of course, indigenous
here. Initially, the crop was grown and examined on personal
grounds using only naked sight. This art of care was passed
down through generations, despite difficulties with retention
and learning. The study of plants and their illnesses were
documented and pursued across time. This research presents
a method for supplementing machine learning with image
processing techniques to make plant pathology easier to
study. The emphasis for categorising leaves is to minimize
the dimensionality of the input by extracting only the most
important & relevant information. Texture, colour, and form
have been removed, making it easier for the ML model to
learn and train on the data and as a result, requiring less
computing cost than the current existing methods [1]. This
method differs from previous methods that focused solely on
obtaining shape-based data [2]. Furthermore, the second
section of the research focuses on disease detection.
Currently, most disease detection methods use a large
amount of computational resources [3], and deep learning
models with CNNs have been employed to categorise
diseases. The concept of transforming leaf photos into
distinct colour spaces and applying the concept of
thresholding is proposed in this study.
II. METHODOLOGY
A feature vector was generated from the leaf image and
subjected to various machine learning models to identify the
type of leaf from its image. For this, the data collection
comprised of Flavia, Mendeley, Diseased Mendeley, and
Garden Images.
Fig. 1. (a) Garden Dataset (b) Flavia Dataset (c) Diseased Mendeley
Dataset (d) Mendeley Dataset
Fig. 2. Flowchart Of The Proposed Model
2022 International Conference on Signal and Information Processing (IConSIP)
College of Engineering Pune, India. Aug 26-27, 2022
978-1-7281-6885-2/22/$31.00 ©2022 IEEE 1
2022
International
Conference
on
Signal
and
Information
Processing
(IConSIP)
|
978-1-7281-6885-2/22/$31.00
©2022
IEEE
|
DOI:
10.1109/ICONSIP49665.2022.10007506
Authorized licensed use limited to: Indian Institute of Technology Indore. Downloaded on April 06,2024 at 17:39:55 UTC from IEEE Xplore. Restrictions apply.

A total of 17 criteria were assessed, including the colour,
texture, and shape of a leaf. The Otsu Thresholding
technique was used to separate the leaf from the background.
In addition, gaussian blur was used to remove noise.
A. Pre-Processing
All photos were downsized to a standard size of
1600*1200*3 before being converted from RGB to grayscale
for additional thresholding. A gaussian blur with kernel size
(55,55) was used to minimise the noise in this image. This
noise-reduced image is then thresholded using Otsu's
approach [7] to produce a binary image that separates the
leaf body from the background. For more uniform masking,
holes were also closed. This mask is applied to the original
leaf image, as illustrated in Fig. 3(e), to ensure that the
background does not influence leaf classification.
B. Leaf Classification
A total of 17 features were extracted from the leaf that
was used in classifying it into 42 different classes. Color,
texture, and shape are the most noticeable features.
1) Color Based Feature Extraction
The extracted colour features were the mean value of the
R component, the mean value of the G component, and the
mean value of the B component. In addition, the standard
deviation of these components, namely R, G, and B, was
taken into account.
Fig. 3. (a) Input Image (b) Grayscale Image (c) Noise Reduced Image (d)
Cropped Leaf Image (e) Background Subtracted Image (f) Grayscale
Background Subtracted Image
As a result, a total of six features are calculated and
appended to the feature vector. These characteristics are
recovered from the background-subtracted leaf, as shown in
Fig. 3(e).
2) Texture Based Feature Extraction
The texture-based characteristics are estimated from the
grayscale background-subtracted leaf acquired in the
preceding phases, as shown in Fig. 3 (f). The Mahotas library
[8] was used to calculate texture-based characteristics.
Contrast, correlation, inverse difference moments, and
entropy were chosen from among the 13 Haralick traits to be
included in the set.
3) Shape-Based Feature Extraction
The following characteristics are extracted: Area,
Perimeter, Physiological Length, Physiological Width, aspect
ratio, rectangularity, and circularity. The shape-based feature
extraction yields a total of seven features.
Fig. 4. Image showing Physiological Width (PW) & Physiological Length
(PL)
TABLE I. FEATURES EXTRATED
No. Feature No. Feature
1. Area 11. Entropy
2. Perimeter 12. Mean R
3. Physiological Length 13. Mean G
4. Physiological Width 14. Mean B
5. Aspect Ratio 15. Standard Deviation Of R
6. Rectangularity 16. Standard Deviation Of G
7. Circularity 17. Standard Deviation Of B
8. Contrast
9. Correlation
10.
Inversely Difference
Movement
C. Disease Detection
Different colour spaces were investigated for the disease
detection part. The masked image with the background
removed was utilised to detect illness. This RGB image has
been translated into the HSV and Lab colour spaces. Each
component of both colour spaces was identified, yielding six
distinct images. Otsu thresholding was used on these six
separate components, and the output was analysed further.
Fig. 5. HSV Color Space Flow of an Image from the Diseased Mendeley
Dataset
2

Fig. 6. HSV Color Space Flow of an Image from the Garden Dataset
III. DATASET
The leaves in this research are drawn from four
independent datasets that have been combined. Because each
dataset contains leaves of various classes and background
conditions, it is advantageous to train models on the
combined dataset to provide more generalizability. Flavia
Dataset [4], Mendeley Dataset [5], Diseased Mendeley
Dataset [6], and a Garden Dataset that is self-curated from a
homegrown garden, with natural photos that users may shoot
without any preprocessing. The flavia dataset and the
Mendeley dataset, in particular, solely include non-diseased
photographs, whereas the rest are diseased images. To keep
things consistent, the background of all these photographs is
set to white. 10 classes were chosen from the Mendeley
Dataset out of a total of 30 classes in this dataset. Lemon,
Peepal Tree, Jasmine, Mango, and other classes have been
chosen. These classifications were chosen due to their
frequency of occurrence in India. Photos of the lemon class,
which consisted of 77 images from the diseased Mendeley
dataset, were included to the diseased component of this
complete dataset. Six lemon leaf images and two mango leaf
images were also included as the Garden Dataset. These
leaves were captured using common and easily accessible
digital scanners.
TABLE II. DESCRIPTION OF DATASETS
Dataset
Total
Classes
Images
Per Class
Total Images
Flavia Dataset (D1) 32 60 1907
Mendeley Dataset (D2) 10 60 603
Diseased Mendeley Dataset
(D3)
1 77 77
Garden Dataset (D4) 2 4 8
Combined Dataset 42 61 2595
IV. RESULTS AND DISCUSSIONS
A. Leaf Classification
Two alternative dataset combinations were used to train
the model. When the model was trained solely on the Flavia
dataset, its accuracy was shown to be lower than when it
was trained on the combined dataset. One of the
explanations could be that when the dataset size and number
of classes rise, the model may be able to explain the
variation in the combined dataset better, as shown in
Table3.
Support Vector Machine is used to train the model
(SVM). To train the model, the RBF kernel was employed.
Because it is taught using a Machine Learning model and
does not use neural networks, the computer resources
required are minimal.
For experimental reasons, the dataset was divided into
two parts: the testing dataset and the training dataset, with
the training dataset accounting for 70% of the total and the
testing dataset accounting for 30%.
For classification, several models were used, including
SVM, KNN, Naive Bayes, Decision Tree, and Random
Forest, and it was discovered that SVM produced the best
results.
TABLE III. ACCURACY FOR DIFFERENT MODELS
DataSet Model Parameters Accuracy
Flavia
Dataset
(32
classes)
SVM
C = 10, kernel = 'rbf',
gamma=0.1
91.45%
KNN
n_neighbors = 5, metric =
'minkowski', p = 2
85.17%
Naive
Bayes
var_smoothing = 10-9
78.88%
Decision
Tree
criterion = 'entropy' 77.49%
Random
Forest
n_estimators = 1000, criterion
= 'entropy'
88.31%
Full
Dataset
with
diseased
Images
(42
classes)
SVM
C = 10, kernel = 'rbf',
gamma=0.1
92.3%
KNN
n_neighbors = 5, metric =
'minkowski', p = 2
83.34%
Naive
Bayes
var_smoothing = 10-9
78.05%
Decision
Tree
criterion = 'entropy' 75.22%
Random
Forest
n_estimators = 1000, criterion
= 'entropy'
89.73%
B. Disease Detection
Comparing outputs from several classes of leaves, it was
discovered that the ‘S’ component of HSV colour space and
‘a’ component of Lab colour space after Otsu thresholding
produced the best accurate segmentation map of diseased
regions.
3

TABLE IV. METRIC IN DESCENDING ORDER BASED ON F1 SCORE
Precision Recall F1 Score Class
1 1 1 True Indigo
1 1 1 Goldenrain Tree
1 1 1 Japanese Cheesewood
1 1 1 Deodar
1 1 1 Canadian Poplar
1 1 1 Ficus Religiosa
1 1 1 Drumstick
1 1 1 Oleander
1 1 1 Mango
0.95 1 0.97 Japanese Maple
1 0.95 0.97 Camphortree
1 0.95 0.97 Trident Maple
1 0.94 0.97 Tangerine
0.93 1 0.97 Carissa Carandas
0.94 1 0.97 Indian Beech
1 0.93 0.96 Castor Aralia
0.96 0.96 0.96 Glossy Privet
1 0.93 0.96 Jasmine
0.9 1 0.95 Chinese Toon
1 0.9 0.95 Beales Barberry
0.88 1 0.94 Sweet Osmanthus
0.96 0.92 0.94 Ginkgo, Maidenhair
1 0.88 0.94 Yew Plum Pine
0.94 0.94 0.94 Mint
0.86 1 0.93 Anhui Barberry
0.91 0.95 0.93 Chinese Redbud
1 0.88 0.93 Peach
1 0.88 0.93 Chinese Tulip Tree
0.96 0.89 0.92 Lemon
0.93 0.87 0.9 Crape & Crepe Myrtle
0.82 1 0.9 Jamaica Cherry
0.8 1 0.89 Nanmu
0.94 0.84 0.89 Southern Magnolia
0.88 0.88 0.88 Oleander
0.83 0.9 0.86 Pubescent bamboo
0.77 0.91 0.83 Japan Arrowwood
1 0.68 0.81 Ford Woodlotus
0.67 0.93 0.78 Big-Fruited Holly
0.83 0.71 0.77 Chinese Cinamon
0.8 0.67 0.73 Wintersweet
0.67 0.77 0.71 Chinese horse chestnut
0.6 0.64 0.62 Japanese Flowering Cherry
V. CONCLUSION
This study suggests a method for classifying leaves and
finding diseased spots within them. Instead of utilising over
parameterized deep learning models, the emphasis in this
study is on establishing a workflow that successfully uses
image processing techniques to complement machine
learning models. This has the advantage of increasing
approach simplicity, interpretability, and lowering the
computing resource and training time burden. The obtained
findings support the idea that this hybrid strategy can
generate good performance in terms of various accuracy
measures even when the combined dataset comprises of
many classes originating from multiple datasets taken under
different conditions. Table 4 displays the results for all 42
classes in descending order based on their F1 Score.
This work has the potential to be expanded in a number
of intriguing directions. Deep learning models are usually
difficult to explain, whereas simpler machine learning
models explain their predictions effectively. As a result,
image processing techniques that are compatible with deep
learning systems are likely to help reduce their overall
complexity, with the potential for increased performance.
Using the concept of colour space transformation ,it
provides an approach that does not require any data labelling
efforts, and if proven to be generalizable across more classes
of leaves, can be a strong alternative to more hardware
intensive learning-based techniques that are commonly used
for this task. This concept could potentially be utilised as a
supplement to machine learning classifiers, resulting in
improved performance and generalizability.
REFERENCES
[1] T. J. Jassmann, R. Tashakkori and R. M. Parry, "Leaf classification
utilizing a convolutional neural network," in IEEE Southeastcon 2015,
Fort Lauderdale, FL, USA, 2015.
[2] V. Srivastava and A. Khunteta, "Comparative Analysis of Leaf
Classification and Recognition by Different SVM Classifiers," in 2018
International Conference on Inventive Research in Computing
Applications (ICIRCA), Coimbatore, India, 2018.
[3] P. Sharma, P. Hans and S. C. Gupta, "Classification Of Plant Leaf
Diseases Using Machine Learning And Image Preprocessing
Techniques," in 2020 10th International Conference on Cloud
Computing, Data Science & Engineering (Confluence), Noida, India,
2020.
[4] F. S. B. E. Y. X. Y. W. Y. C. a. Q. X. S. G. Wu, "A Leaf Recognition
Algorithm for Plant Classification Using Probabilistic Neural
Network," IEEE International Symposium on Signal Processing and
Information Technology, 2007, pp. 11-16, 2007.
[5] A. J. Roopashree S, Medicinal Leaf Dataset, India: Mendeley, 2020.
[6] S. S. CHOUHAN, A. Kaul and U. P. SINGH, A Database of Leaf
Images: Practice towards Plant Conservation with Plant Pathology,
India: Science, Madhav Institute of Technology, 2019.
[7] R. D. A. N. S. a. C. F. E. Prasetyo, "Mango leaf image segmentation on
HSV and YCbCr color spaces using Otsu thresholding," 3rd
International Conference on Science and Technology - Computer
(ICST), pp. 99-103, 2017.
[8] K. S. a. I. D. R. M. Haralick, "Textural Features for Image
Classification," IEEE Transactions on Systems, Man, and Cybernetics,
vol. SMC, no. 3, pp. 610-621, Nov 1973.
4

Image_Processing_based_ML_Framework_for_Leaf_Classification_and_Disease_Detection.pdf

Recommended

Recommended

More Related Content

Similar to Image_Processing_based_ML_Framework_for_Leaf_Classification_and_Disease_Detection.pdf

Similar to Image_Processing_based_ML_Framework_for_Leaf_Classification_and_Disease_Detection.pdf (20)

Recently uploaded

Recently uploaded (20)

Image_Processing_based_ML_Framework_for_Leaf_Classification_and_Disease_Detection.pdf