Hybrid SVM_LR classifier for plant disease prediction

HYBRID SVM-LR CLASSIFIER
FOR POWDERY MILDEW
DISEASE PREDICTION IN
TOMATO PLANT
PAPER ID-129
ANSHUL BHATIA
RESEARCH SCHOLAR, USIC&T
GURU GOBIND SINGH INDRAPRASTHA UNIVERSITY
DWARKA SECTOR-16C
NEW DELHI (110078), INDIA
7th International Conference on Signal Processing and Integrated Networks (SPIN 2020)
27 - 28 February 2020, Amity University, Noida, India

CONTENTS
• Introduction
• Database Used
• Random Over (RO) Sampling
• Adaptive Sampling based noise reduction (ANR) method
• Support Vector Machine
• Logistic Regression
• Proposed Method
• Experimental Results
• Conclusion and Future Direction
• References
2

INTRODUCTION
3
• Powdery mildew is a contagious disease caused by fungus named LeveillulaTaurica, which can
severely affect the quality and productivity of tomato crop.
• So, the detection and treatment of powdery mildew disease in tomatoes is very crucial because
it can adversely affect the yield of tomato plants.
• Machine learning based classification algorithms can be used for developing forecasting model
for plant disease prediction.
• A hybrid SVM-LR classifier has been proposed in current study for detection of powdery mildew
disease.
• Hybrid SVM-LR is implemented here to get more accurate results for tomato powdery mildew
prediction as compared to previous study.

DATABASE USED
4
Tomato Powdery Mildew Disease (TPMD) Dataset
• Binary-class imbalanced dataset
• Includes statistics about severity of powdery mildew disease based on weather conditions
• The overall dataset contains 244 data points upon 5 unique features
• Independent Variables: GR (watt/m2), LW (%), WS (KM/h), RH (%), and T (ₒc)
• Dependent Variables: Day Prediction (DP) (conducive or non-conducive)

RANDOM OVER (RO) SAMPLING
5
• Non-heuristic resampling technique
• Widely used for balancing imbalanced datasets
• Balances imbalanced dataset by randomly copying the existing samples of minor classes for
increasing the number of data points in the train-set in order to balance it with major classes
• Following table shows the distribution of classes before and after RO sampling for TPMD
dataset:
TPMD dataset
Class Before RO sampling After RO sampling
Conducive 27 217
Non-Conducive 217 217
(No. of samples) 244 434

ADAPTIVE SAMPLING BASED NOISE REDUCTION
(ANR) METHOD
6
• Deal with noisy class labeled data
• Acts as a wrapper for various classifiers, such as LR, k-Nearest Neighbor (kNN), SVM, weighted
kNN, and LDA
• Provides a noise-minimized train set by iteratively calculating the probability of class mislabeling
by using adaptive sampling technique
• Improved train set obtained from this model can reduce the risk of choosing mislabeled samples
for training of model
• Hence, a precise and generalized model can be obtained

(ANR) METHOD (CONT...)
7
• In this study, ANR method has been used with SVM classifier for the reduction of noise labels
(misclassified labels) from the train set obtained from TPMD dataset
• ANR method with SVM classifier provides the probability of conducive and non-conducive labels
based on the independent weather parameters.
• These probabilities have further been used for developing the noise-minimized train set.
• Following table shows a sample of train set with the probability value of class labels:
Weather-Parameters (Independent Variables) Probabilities of class labels
T RH LW WS GR P N
24.8 92 35 1 34 0.998 0.001
21.4 82 30 3 32 0.001 0.998
25.1 83 29 1 41 0.987 0.012
24.3 65 16 2 40 0.001 0.998
30.1 67 34 2 56 0.000 0.999

8
• In this study, ANR method has been used with SVM classifier for the reduction of noise labels
(misclassified labels) from the train set obtained from TPMD dataset
• ANR method with SVM classifier provides the probability of conducive and non-conducive labels
based on the independent weather parameters.
• These probabilities have further been used for developing the noise-minimized train set.
• Following table shows a sample of train set with the probability value of class labels:
In this table:
P: Probability of positive class i.e. conducive class
N: Probability of negative class i.e. non-conducive class.
Weather-Parameters (Independent
Variables)
Probabilities of class labels
T
R
H
L
W
W
S
GR P N
24.8 92 35 1 34 0.998 0.001
21.4 82 30 3 32 0.001 0.998
25.1 83 29 1 41 0.987 0.012
24.3 65 16 2 40 0.001 0.998
30.1 67 34 2 56 0.000 0.999

9
• Based on these probabilities, a modified train-set has been developed using predicted class
adjustment criteria as shown in following table:
• Based on above mentioned criteria, a noise-minimized dataset has been developed. A sample of
noise-minimized dataset is shown in following table:
Probabilities Comparison Adjusted Class
P>N 1 (Conducive)
N>P 0 (Non-Conducive)
Weather-Parameters (Independent Variables) Class (Dependent Variable)
T RH LW WS GR
24.8 92 35 1 34 1
21.4 82 30 3 32 0
25.1 83 29 1 41 1
24.3 65 16 2 40 0
30.1 67 34 2 56 0

SUPPORT VECTOR MACHINE (SVM)
10
• Widely used supervised machine learning algorithm
• A point is plotted for each sample present in the dataset
in an m-dimensional space, where m represents
number of attributes available in the dataset
• Each coordinate present in space indicates a particular
attribute.
• SVM algorithm basically identifies the best hyper-plane
that divides the two labeled classes accurately
• The hyper-plane with the highest marginal difference is
considered as the best hyper-plane.
x
Y
A
B
C

LOGISTIC REGRESSION (LR)
11
• Supervised Machine Learning Algorithm
• Shows relationship between categorical dependent variable and a set of predicted
(independent) variables.
• Prediction obtained from LR model provides the probabilities of successful and unsuccessful
events for the collection of independent variables.
• If Class is a response variable and T, RH, LW, GR, and WS are predicted variables then the
equation of LR can be written as follows (Equation 1):
𝑙𝑛
𝑝 𝐶𝑙𝑎𝑠𝑠
1−𝑝 𝐶𝑙𝑎𝑠𝑠
= 𝛽0 + 𝛽1𝑇 + 𝛽2𝑅𝐻 + 𝛽3𝐿𝑊 + 𝛽4𝐺𝑅 + 𝛽5𝑊𝑆 (1)

LOGISTIC REGRESSION (LR) (CONT.…)
12
• In Equation 1:
p (Class)/1-p (Class) = Ratio of probability of success to failure
β0 to β5 : Regression coefficients
Class: Response Variable (0 and 1)
• Regression coefficients can be calculated using a popular approach known as Maximum Likelihood
Estimation. On taking inverse of Equation 1, we get:
𝑝 𝐶𝑙𝑎𝑠𝑠 =
𝑒𝛽0+𝛽1𝑇+𝛽2𝑅𝐻+𝛽3𝐿𝑊+𝛽4𝐿𝑅+𝛽5𝑊𝑆
1+𝑒𝛽0+𝛽1𝑇+𝛽2𝑅𝐻+𝛽3𝐿𝑊+𝛽4𝐿𝑅+𝛽5𝑊𝑆 (2)
• The above gives the value of probabilities within the range of 0 and 1.
• If the value of p comes out to be greater than 0.5 then the value of response variable Class is 1
otherwise it is 0.

PROPOSED METHOD
13
Application of RO sampling
Train set (70%)
SVM classifier
ANR method
TPMD data set (Imbalanced)
Noise Reduction
Data Cleaning
Modified train set
10-fold cross validation
LR classifier
Prediction
Model
Test
set
(30%)
Performance Evaluation
(Accuracy, AUC, and F1-score)
TPMD dataset (Balanced)

EXPERIMENTAL RESULTS
14
Classifier
Performance Metrics
Accuracy AUC F1-score
LR 87.02% 0.8777 0.8722
SVM 89.31% 0.8988 0.8923
SVM-LR 92.37% 0.9270 0.9264
0.84
0.85
0.86
0.87
0.88
0.89
0.9
0.91
0.92
0.93
0.94
LR SVM SVM-LR
Performance
Metrics
Classifiers
Performance of SVM, LR and SVM-LR classifier
Accuracy
AUC
F1-score

CONCLUSION AND FUTURE DIRECTION
15
• This study discusses a Hybrid SVM-LR approach for better prediction of powdery mildew disease
in tomato plants.
• The proposed approach has effectively been implemented on TPMD dataset showing superiority
in predicting powdery mildew disease over SVM and LR classifiers in terms of accuracy, AUC and
F1-score metrics.
• Since current work did not use any feature selection algorithm to identify the most important
features for the detection of powdery mildew disease in tomato plant.
• This work can further be extended by using feature selection techniques to further improve the
performance of prediction models.
• Various meta-heuristic and optimization algorithms can also be used for better results.

REFERENCES
16
• W. B. Jones, S. V Thomson, and others, “Source of inoculum, yield, and quality of tomato as affected by Leveillula taurica.,” Plant Dis., vol. 71, no. 3, pp. 266–268, 1987.
• U. Braun and others, “A monograph of the Erysiphales (powdery mildews).,” Beihefte zur Nov. Hedwigia, no. 89, 1987.
• A. R. T. Bakeer, M. A. E. Abdel-Latef, M. A. Afifi, and M. E. Barakat, “Validation of Tomato Powdery Mildew Forecasting Model using Meteorological Data in Egypt,” Int. J. Agric. Sci., vol. 5, no. 2, p. 372, 2013.
• R. A. Guzman-Plazola, Development of a Spray Forecast Model for Tomato Powdery Mildew (Leveillula Taurica (Lev). Arn.). University of California, Davis, 1997.
• A. Fuentes, S. Yoon, S. Kim, and D. Park, “A robust deep-learning-based detector for real-time tomato plant diseases and pests recognition,” Sensors, vol. 17, no. 9, p. 2022, 2017.
• U. Mokhtar, M. A. S. Ali, A. E. Hassenian, and H. Hefny, “Tomato leaves diseases detection approach based on support vector machines,” in 2015 11th International Computer Engineering Conference (ICENCO), 2015, pp. 246–250.
• R. Ghaffari et al., “Early detection of diseases in tomato crops: An electronic nose and intelligent systems approach,” in The 2010 International Joint Conference on Neural Networks (IJCNN), 2010, pp. 1–6.
• S. Verma, A. Bhatia, A. Chug, and A. P. Singh, “Recent Advancements in Multimedia Big Data Computing for IoT Applications in Precision Agriculture: Opportunities, Issues, and Challenges,” in Multimedia Big Data Computing for IoT Applications, Springer, 2020, pp. 391–
416.
• S. Verma, A. Chug, and A. P. Singh, “Prediction Models for Identification and Diagnosis of Tomato Plant Diseases,” in 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2018, pp. 1557–1563.
• S. Verma, A. Chug, A. P. Singh, S. Sharma, and P. Rajvanshi, “Deep Learning-Based Mobile Application for Plant Disease Diagnosis: A Proof of Concept With a Case Study on Tomato Plant,” in Applications of Image Processing and Soft Computing Systems in Agriculture, IGI
Global, 2019, pp. 242–271.
• T. Rumpf, A.-K. Mahlein, U. Steiner, E.-C. Oerke, H.-W. Dehne, and L. Plümer, “Early detection and classification of plant diseases with support vector machines based on hyperspectral reflectance,” Comput. Electron. Agric., vol. 74, no. 1, pp. 91–99, 2010.
• G. Prince, J. P. Clarkson, N. M. Rajpoot, and others, “Automatic detection of diseased tomato plants using thermal and stereo visible light images,” PLoS One, vol. 10, no. 4, p. e0123262, 2015.
• M. McGrath, “Powdery mildew on tomatoes.” [Online]. Available:http://blogs.cornell.edu/livegpath/gallery/tomato/powdery-mildew-on-tomatoes/.
• G. E. Batista, R. C. Prati, and M. C. Monard, “A study of the behavior of several methods for balancing machine learning training data,” ACM SIGKDD Explor. Newsl., vol. 6, no. 1, pp. 20–29, 2004.
• P. Yang, J. T. Ormerod, W. Liu, C. Ma, A. Y. Zomaya, and J. Y. H. Yang, “AdaSampling for positive-unlabeled and label noise learning with bioinformatics applications,” IEEE Trans. Cybern., vol. 49, no. 5, pp. 1932–1943, 2018.
• J. A. K. Suykens and J. Vandewalle, “Least squares support vector machine classifiers,” Neural Process. Lett., vol. 9, no. 3, pp. 293–300, 1999.

THANK YOU
17

Hybrid SVM_LR classifier for plant disease prediction

Recommended

Recommended

More Related Content

Similar to Hybrid SVM_LR classifier for plant disease prediction

Similar to Hybrid SVM_LR classifier for plant disease prediction (20)

Recently uploaded

Recently uploaded (20)

Hybrid SVM_LR classifier for plant disease prediction