SlideShare a Scribd company logo
HYBRID SVM-LR CLASSIFIER
FOR POWDERY MILDEW
DISEASE PREDICTION IN
TOMATO PLANT
PAPER ID-129
ANSHUL BHATIA
RESEARCH SCHOLAR, USIC&T
GURU GOBIND SINGH INDRAPRASTHA UNIVERSITY
DWARKA SECTOR-16C
NEW DELHI (110078), INDIA
7th International Conference on Signal Processing and Integrated Networks (SPIN 2020)
27 - 28 February 2020, Amity University, Noida, India
CONTENTS
• Introduction
• Database Used
• Random Over (RO) Sampling
• Adaptive Sampling based noise reduction (ANR) method
• Support Vector Machine
• Logistic Regression
• Proposed Method
• Experimental Results
• Conclusion and Future Direction
• References
2
7th International Conference on Signal Processing and Integrated Networks (SPIN 2020)
27 - 28 February 2020, Amity University, Noida, India
INTRODUCTION
3
7th International Conference on Signal Processing and Integrated Networks (SPIN 2020)
27 - 28 February 2020, Amity University, Noida, India
• Powdery mildew is a contagious disease caused by fungus named LeveillulaTaurica, which can
severely affect the quality and productivity of tomato crop.
• So, the detection and treatment of powdery mildew disease in tomatoes is very crucial because
it can adversely affect the yield of tomato plants.
• Machine learning based classification algorithms can be used for developing forecasting model
for plant disease prediction.
• A hybrid SVM-LR classifier has been proposed in current study for detection of powdery mildew
disease.
• Hybrid SVM-LR is implemented here to get more accurate results for tomato powdery mildew
prediction as compared to previous study.
DATABASE USED
4
7th International Conference on Signal Processing and Integrated Networks (SPIN 2020)
27 - 28 February 2020, Amity University, Noida, India
Tomato Powdery Mildew Disease (TPMD) Dataset
• Binary-class imbalanced dataset
• Includes statistics about severity of powdery mildew disease based on weather conditions
• The overall dataset contains 244 data points upon 5 unique features
• Independent Variables: GR (watt/m2), LW (%), WS (KM/h), RH (%), and T (ₒc)
• Dependent Variables: Day Prediction (DP) (conducive or non-conducive)
RANDOM OVER (RO) SAMPLING
5
7th International Conference on Signal Processing and Integrated Networks (SPIN 2020)
27 - 28 February 2020, Amity University, Noida, India
• Non-heuristic resampling technique
• Widely used for balancing imbalanced datasets
• Balances imbalanced dataset by randomly copying the existing samples of minor classes for
increasing the number of data points in the train-set in order to balance it with major classes
• Following table shows the distribution of classes before and after RO sampling for TPMD
dataset:
TPMD dataset
Class Before RO sampling After RO sampling
Conducive 27 217
Non-Conducive 217 217
(No. of samples) 244 434
ADAPTIVE SAMPLING BASED NOISE REDUCTION
(ANR) METHOD
6
7th International Conference on Signal Processing and Integrated Networks (SPIN 2020)
27 - 28 February 2020, Amity University, Noida, India
• Deal with noisy class labeled data
• Acts as a wrapper for various classifiers, such as LR, k-Nearest Neighbor (kNN), SVM, weighted
kNN, and LDA
• Provides a noise-minimized train set by iteratively calculating the probability of class mislabeling
by using adaptive sampling technique
• Improved train set obtained from this model can reduce the risk of choosing mislabeled samples
for training of model
• Hence, a precise and generalized model can be obtained
ADAPTIVE SAMPLING BASED NOISE REDUCTION
(ANR) METHOD (CONT...)
7
7th International Conference on Signal Processing and Integrated Networks (SPIN 2020)
27 - 28 February 2020, Amity University, Noida, India
• In this study, ANR method has been used with SVM classifier for the reduction of noise labels
(misclassified labels) from the train set obtained from TPMD dataset
• ANR method with SVM classifier provides the probability of conducive and non-conducive labels
based on the independent weather parameters.
• These probabilities have further been used for developing the noise-minimized train set.
• Following table shows a sample of train set with the probability value of class labels:
Weather-Parameters (Independent Variables) Probabilities of class labels
T RH LW WS GR P N
24.8 92 35 1 34 0.998 0.001
21.4 82 30 3 32 0.001 0.998
25.1 83 29 1 41 0.987 0.012
24.3 65 16 2 40 0.001 0.998
30.1 67 34 2 56 0.000 0.999
ADAPTIVE SAMPLING BASED NOISE REDUCTION
(ANR) METHOD (CONT...)
8
7th International Conference on Signal Processing and Integrated Networks (SPIN 2020)
27 - 28 February 2020, Amity University, Noida, India
• In this study, ANR method has been used with SVM classifier for the reduction of noise labels
(misclassified labels) from the train set obtained from TPMD dataset
• ANR method with SVM classifier provides the probability of conducive and non-conducive labels
based on the independent weather parameters.
• These probabilities have further been used for developing the noise-minimized train set.
• Following table shows a sample of train set with the probability value of class labels:
In this table:
P: Probability of positive class i.e. conducive class
N: Probability of negative class i.e. non-conducive class.
Weather-Parameters (Independent
Variables)
Probabilities of class labels
T
R
H
L
W
W
S
GR P N
24.8 92 35 1 34 0.998 0.001
21.4 82 30 3 32 0.001 0.998
25.1 83 29 1 41 0.987 0.012
24.3 65 16 2 40 0.001 0.998
30.1 67 34 2 56 0.000 0.999
ADAPTIVE SAMPLING BASED NOISE REDUCTION
(ANR) METHOD (CONT...)
9
7th International Conference on Signal Processing and Integrated Networks (SPIN 2020)
27 - 28 February 2020, Amity University, Noida, India
• Based on these probabilities, a modified train-set has been developed using predicted class
adjustment criteria as shown in following table:
• Based on above mentioned criteria, a noise-minimized dataset has been developed. A sample of
noise-minimized dataset is shown in following table:
Probabilities Comparison Adjusted Class
P>N 1 (Conducive)
N>P 0 (Non-Conducive)
Weather-Parameters (Independent Variables) Class (Dependent Variable)
T RH LW WS GR
24.8 92 35 1 34 1
21.4 82 30 3 32 0
25.1 83 29 1 41 1
24.3 65 16 2 40 0
30.1 67 34 2 56 0
SUPPORT VECTOR MACHINE (SVM)
10
7th International Conference on Signal Processing and Integrated Networks (SPIN 2020)
27 - 28 February 2020, Amity University, Noida, India
• Widely used supervised machine learning algorithm
• A point is plotted for each sample present in the dataset
in an m-dimensional space, where m represents
number of attributes available in the dataset
• Each coordinate present in space indicates a particular
attribute.
• SVM algorithm basically identifies the best hyper-plane
that divides the two labeled classes accurately
• The hyper-plane with the highest marginal difference is
considered as the best hyper-plane.
x
Y
A
B
C
LOGISTIC REGRESSION (LR)
11
7th International Conference on Signal Processing and Integrated Networks (SPIN 2020)
27 - 28 February 2020, Amity University, Noida, India
• Supervised Machine Learning Algorithm
• Shows relationship between categorical dependent variable and a set of predicted
(independent) variables.
• Prediction obtained from LR model provides the probabilities of successful and unsuccessful
events for the collection of independent variables.
• If Class is a response variable and T, RH, LW, GR, and WS are predicted variables then the
equation of LR can be written as follows (Equation 1):
𝑙𝑛
𝑝 𝐶𝑙𝑎𝑠𝑠
1−𝑝 𝐶𝑙𝑎𝑠𝑠
= 𝛽0 + 𝛽1𝑇 + 𝛽2𝑅𝐻 + 𝛽3𝐿𝑊 + 𝛽4𝐺𝑅 + 𝛽5𝑊𝑆 (1)
LOGISTIC REGRESSION (LR) (CONT.…)
12
7th International Conference on Signal Processing and Integrated Networks (SPIN 2020)
27 - 28 February 2020, Amity University, Noida, India
• In Equation 1:
p (Class)/1-p (Class) = Ratio of probability of success to failure
β0 to β5 : Regression coefficients
Class: Response Variable (0 and 1)
• Regression coefficients can be calculated using a popular approach known as Maximum Likelihood
Estimation. On taking inverse of Equation 1, we get:
𝑝 𝐶𝑙𝑎𝑠𝑠 =
𝑒𝛽0+𝛽1𝑇+𝛽2𝑅𝐻+𝛽3𝐿𝑊+𝛽4𝐿𝑅+𝛽5𝑊𝑆
1+𝑒𝛽0+𝛽1𝑇+𝛽2𝑅𝐻+𝛽3𝐿𝑊+𝛽4𝐿𝑅+𝛽5𝑊𝑆 (2)
• The above gives the value of probabilities within the range of 0 and 1.
• If the value of p comes out to be greater than 0.5 then the value of response variable Class is 1
otherwise it is 0.
PROPOSED METHOD
13
7th International Conference on Signal Processing and Integrated Networks (SPIN 2020)
27 - 28 February 2020, Amity University, Noida, India
Application of RO sampling
Train set (70%)
SVM classifier
ANR method
TPMD data set (Imbalanced)
Noise Reduction
Data Cleaning
Modified train set
10-fold cross validation
LR classifier
Prediction
Model
Test
set
(30%)
Performance Evaluation
(Accuracy, AUC, and F1-score)
TPMD dataset (Balanced)
EXPERIMENTAL RESULTS
14
7th International Conference on Signal Processing and Integrated Networks (SPIN 2020)
27 - 28 February 2020, Amity University, Noida, India
Classifier
Performance Metrics
Accuracy AUC F1-score
LR 87.02% 0.8777 0.8722
SVM 89.31% 0.8988 0.8923
SVM-LR 92.37% 0.9270 0.9264
0.84
0.85
0.86
0.87
0.88
0.89
0.9
0.91
0.92
0.93
0.94
LR SVM SVM-LR
Performance
Metrics
Classifiers
Performance of SVM, LR and SVM-LR classifier
Accuracy
AUC
F1-score
CONCLUSION AND FUTURE DIRECTION
15
7th International Conference on Signal Processing and Integrated Networks (SPIN 2020)
27 - 28 February 2020, Amity University, Noida, India
• This study discusses a Hybrid SVM-LR approach for better prediction of powdery mildew disease
in tomato plants.
• The proposed approach has effectively been implemented on TPMD dataset showing superiority
in predicting powdery mildew disease over SVM and LR classifiers in terms of accuracy, AUC and
F1-score metrics.
• Since current work did not use any feature selection algorithm to identify the most important
features for the detection of powdery mildew disease in tomato plant.
• This work can further be extended by using feature selection techniques to further improve the
performance of prediction models.
• Various meta-heuristic and optimization algorithms can also be used for better results.
REFERENCES
16
7th International Conference on Signal Processing and Integrated Networks (SPIN 2020)
27 - 28 February 2020, Amity University, Noida, India
• W. B. Jones, S. V Thomson, and others, “Source of inoculum, yield, and quality of tomato as affected by Leveillula taurica.,” Plant Dis., vol. 71, no. 3, pp. 266–268, 1987.
• U. Braun and others, “A monograph of the Erysiphales (powdery mildews).,” Beihefte zur Nov. Hedwigia, no. 89, 1987.
• A. R. T. Bakeer, M. A. E. Abdel-Latef, M. A. Afifi, and M. E. Barakat, “Validation of Tomato Powdery Mildew Forecasting Model using Meteorological Data in Egypt,” Int. J. Agric. Sci., vol. 5, no. 2, p. 372, 2013.
• R. A. Guzman-Plazola, Development of a Spray Forecast Model for Tomato Powdery Mildew (Leveillula Taurica (Lev). Arn.). University of California, Davis, 1997.
• A. Fuentes, S. Yoon, S. Kim, and D. Park, “A robust deep-learning-based detector for real-time tomato plant diseases and pests recognition,” Sensors, vol. 17, no. 9, p. 2022, 2017.
• U. Mokhtar, M. A. S. Ali, A. E. Hassenian, and H. Hefny, “Tomato leaves diseases detection approach based on support vector machines,” in 2015 11th International Computer Engineering Conference (ICENCO), 2015, pp. 246–250.
• R. Ghaffari et al., “Early detection of diseases in tomato crops: An electronic nose and intelligent systems approach,” in The 2010 International Joint Conference on Neural Networks (IJCNN), 2010, pp. 1–6.
• S. Verma, A. Bhatia, A. Chug, and A. P. Singh, “Recent Advancements in Multimedia Big Data Computing for IoT Applications in Precision Agriculture: Opportunities, Issues, and Challenges,” in Multimedia Big Data Computing for IoT Applications, Springer, 2020, pp. 391–
416.
• S. Verma, A. Chug, and A. P. Singh, “Prediction Models for Identification and Diagnosis of Tomato Plant Diseases,” in 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2018, pp. 1557–1563.
• S. Verma, A. Chug, A. P. Singh, S. Sharma, and P. Rajvanshi, “Deep Learning-Based Mobile Application for Plant Disease Diagnosis: A Proof of Concept With a Case Study on Tomato Plant,” in Applications of Image Processing and Soft Computing Systems in Agriculture, IGI
Global, 2019, pp. 242–271.
• T. Rumpf, A.-K. Mahlein, U. Steiner, E.-C. Oerke, H.-W. Dehne, and L. Plümer, “Early detection and classification of plant diseases with support vector machines based on hyperspectral reflectance,” Comput. Electron. Agric., vol. 74, no. 1, pp. 91–99, 2010.
• G. Prince, J. P. Clarkson, N. M. Rajpoot, and others, “Automatic detection of diseased tomato plants using thermal and stereo visible light images,” PLoS One, vol. 10, no. 4, p. e0123262, 2015.
• M. McGrath, “Powdery mildew on tomatoes.” [Online]. Available:http://blogs.cornell.edu/livegpath/gallery/tomato/powdery-mildew-on-tomatoes/.
• G. E. Batista, R. C. Prati, and M. C. Monard, “A study of the behavior of several methods for balancing machine learning training data,” ACM SIGKDD Explor. Newsl., vol. 6, no. 1, pp. 20–29, 2004.
• P. Yang, J. T. Ormerod, W. Liu, C. Ma, A. Y. Zomaya, and J. Y. H. Yang, “AdaSampling for positive-unlabeled and label noise learning with bioinformatics applications,” IEEE Trans. Cybern., vol. 49, no. 5, pp. 1932–1943, 2018.
• J. A. K. Suykens and J. Vandewalle, “Least squares support vector machine classifiers,” Neural Process. Lett., vol. 9, no. 3, pp. 293–300, 1999.
THANK YOU
17
7th International Conference on Signal Processing and Integrated Networks (SPIN 2020)
27 - 28 February 2020, Amity University, Noida, India

More Related Content

Similar to Hybrid SVM_LR classifier for plant disease prediction

Amplification, ROADM and Optical Networking activities at CPqD
Amplification, ROADM and Optical Networking activities at CPqDAmplification, ROADM and Optical Networking activities at CPqD
Amplification, ROADM and Optical Networking activities at CPqD
CPqD
 
Performance Analysis of Fading Channels on Cooperative Mode Spectrum Sensing ...
Performance Analysis of Fading Channels on Cooperative Mode Spectrum Sensing ...Performance Analysis of Fading Channels on Cooperative Mode Spectrum Sensing ...
Performance Analysis of Fading Channels on Cooperative Mode Spectrum Sensing ...
ijtsrd
 

Similar to Hybrid SVM_LR classifier for plant disease prediction (20)

IRJET-A Hybrid Intrusion Detection Technique based on IRF & AODE for KDD-CUP ...
IRJET-A Hybrid Intrusion Detection Technique based on IRF & AODE for KDD-CUP ...IRJET-A Hybrid Intrusion Detection Technique based on IRF & AODE for KDD-CUP ...
IRJET-A Hybrid Intrusion Detection Technique based on IRF & AODE for KDD-CUP ...
 
Gradient Based Adaptive Beamforming
Gradient Based Adaptive BeamformingGradient Based Adaptive Beamforming
Gradient Based Adaptive Beamforming
 
IRJET - Finger Vein Extraction and Authentication System for ATM
IRJET -  	  Finger Vein Extraction and Authentication System for ATMIRJET -  	  Finger Vein Extraction and Authentication System for ATM
IRJET - Finger Vein Extraction and Authentication System for ATM
 
IRJET- Intrusion Detection using IP Binding in Real Network
IRJET- Intrusion Detection using IP Binding in Real NetworkIRJET- Intrusion Detection using IP Binding in Real Network
IRJET- Intrusion Detection using IP Binding in Real Network
 
Water Quality Index Calculation of River Ganga using Decision Tree Algorithm
Water Quality Index Calculation of River Ganga using Decision Tree AlgorithmWater Quality Index Calculation of River Ganga using Decision Tree Algorithm
Water Quality Index Calculation of River Ganga using Decision Tree Algorithm
 
IRJET- Titanic Survival Analysis using Logistic Regression
IRJET-  	  Titanic Survival Analysis using Logistic RegressionIRJET-  	  Titanic Survival Analysis using Logistic Regression
IRJET- Titanic Survival Analysis using Logistic Regression
 
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUESNEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
 
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUESNEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
NEURAL NETWORKS WITH DECISION TREES FOR DIAGNOSIS ISSUES
 
Amplification, ROADM and Optical Networking activities at CPqD
Amplification, ROADM and Optical Networking activities at CPqDAmplification, ROADM and Optical Networking activities at CPqD
Amplification, ROADM and Optical Networking activities at CPqD
 
IRJET - Movie Genre Prediction from Plot Summaries by Comparing Various C...
IRJET -  	  Movie Genre Prediction from Plot Summaries by Comparing Various C...IRJET -  	  Movie Genre Prediction from Plot Summaries by Comparing Various C...
IRJET - Movie Genre Prediction from Plot Summaries by Comparing Various C...
 
AMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLTAMAZON STOCK PRICE PREDICTION BY USING SMLT
AMAZON STOCK PRICE PREDICTION BY USING SMLT
 
AIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTIONAIRLINE FARE PRICE PREDICTION
AIRLINE FARE PRICE PREDICTION
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
Automatic Target Detection using Maximum Average Correlation Height Filter an...
Automatic Target Detection using Maximum Average Correlation Height Filter an...Automatic Target Detection using Maximum Average Correlation Height Filter an...
Automatic Target Detection using Maximum Average Correlation Height Filter an...
 
Performance Analysis of Fading Channels on Cooperative Mode Spectrum Sensing ...
Performance Analysis of Fading Channels on Cooperative Mode Spectrum Sensing ...Performance Analysis of Fading Channels on Cooperative Mode Spectrum Sensing ...
Performance Analysis of Fading Channels on Cooperative Mode Spectrum Sensing ...
 
IRJET- Implementation of TPG-LFSR with Reseeding Pattern Value
IRJET-  	  Implementation of TPG-LFSR with Reseeding Pattern ValueIRJET-  	  Implementation of TPG-LFSR with Reseeding Pattern Value
IRJET- Implementation of TPG-LFSR with Reseeding Pattern Value
 
IRJET - A Novel Approach for Software Defect Prediction based on Dimensio...
IRJET -  	  A Novel Approach for Software Defect Prediction based on Dimensio...IRJET -  	  A Novel Approach for Software Defect Prediction based on Dimensio...
IRJET - A Novel Approach for Software Defect Prediction based on Dimensio...
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 
IRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms ComparisonIRJET- Supervised Learning Classification Algorithms Comparison
IRJET- Supervised Learning Classification Algorithms Comparison
 
IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...
IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...
IRJET- Study of Prediction Algorithms on Aviation Accident Dataset using Rapi...
 

Recently uploaded

School management system project report.pdf
School management system project report.pdfSchool management system project report.pdf
School management system project report.pdf
Kamal Acharya
 
一比一原版(UNK毕业证)内布拉斯加州立大学科尼分校毕业证成绩单
一比一原版(UNK毕业证)内布拉斯加州立大学科尼分校毕业证成绩单一比一原版(UNK毕业证)内布拉斯加州立大学科尼分校毕业证成绩单
一比一原版(UNK毕业证)内布拉斯加州立大学科尼分校毕业证成绩单
tuuww
 
Online blood donation management system project.pdf
Online blood donation management system project.pdfOnline blood donation management system project.pdf
Online blood donation management system project.pdf
Kamal Acharya
 
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdf
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdfONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdf
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdf
Kamal Acharya
 

Recently uploaded (20)

School management system project report.pdf
School management system project report.pdfSchool management system project report.pdf
School management system project report.pdf
 
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and VisualizationKIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
 
Online resume builder management system project report.pdf
Online resume builder management system project report.pdfOnline resume builder management system project report.pdf
Online resume builder management system project report.pdf
 
NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...
NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...
NO1 Pandit Black Magic Removal in Uk kala jadu Specialist kala jadu for Love ...
 
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdfRESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
RESORT MANAGEMENT AND RESERVATION SYSTEM PROJECT REPORT.pdf
 
一比一原版(UNK毕业证)内布拉斯加州立大学科尼分校毕业证成绩单
一比一原版(UNK毕业证)内布拉斯加州立大学科尼分校毕业证成绩单一比一原版(UNK毕业证)内布拉斯加州立大学科尼分校毕业证成绩单
一比一原版(UNK毕业证)内布拉斯加州立大学科尼分校毕业证成绩单
 
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES  INTRODUCTION UNIT-IENERGY STORAGE DEVICES  INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
 
Online book store management system project.pdf
Online book store management system project.pdfOnline book store management system project.pdf
Online book store management system project.pdf
 
Furniture showroom management system project.pdf
Furniture showroom management system project.pdfFurniture showroom management system project.pdf
Furniture showroom management system project.pdf
 
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-5 Notes for II-II Mechanical Engineering
 
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...Software Engineering - Modelling Concepts + Class Modelling + Building the An...
Software Engineering - Modelling Concepts + Class Modelling + Building the An...
 
1. Henrich Triangle Safety and Fire Presentation
1. Henrich Triangle Safety and Fire Presentation1. Henrich Triangle Safety and Fire Presentation
1. Henrich Triangle Safety and Fire Presentation
 
Supermarket billing system project report..pdf
Supermarket billing system project report..pdfSupermarket billing system project report..pdf
Supermarket billing system project report..pdf
 
internship exam ppt.pptx on embedded system and IOT
internship exam ppt.pptx on embedded system and IOTinternship exam ppt.pptx on embedded system and IOT
internship exam ppt.pptx on embedded system and IOT
 
Lect 2 - Design of slender column-2.pptx
Lect 2 - Design of slender column-2.pptxLect 2 - Design of slender column-2.pptx
Lect 2 - Design of slender column-2.pptx
 
Online blood donation management system project.pdf
Online blood donation management system project.pdfOnline blood donation management system project.pdf
Online blood donation management system project.pdf
 
Peek implant persentation - Copy (1).pdf
Peek implant persentation - Copy (1).pdfPeek implant persentation - Copy (1).pdf
Peek implant persentation - Copy (1).pdf
 
Attraction and Repulsion type Moving Iron Instruments.pptx
Attraction and Repulsion type Moving Iron Instruments.pptxAttraction and Repulsion type Moving Iron Instruments.pptx
Attraction and Repulsion type Moving Iron Instruments.pptx
 
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdf
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdfONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdf
ONLINE VEHICLE RENTAL SYSTEM PROJECT REPORT.pdf
 
2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge2024 DevOps Pro Europe - Growing at the edge
2024 DevOps Pro Europe - Growing at the edge
 

Hybrid SVM_LR classifier for plant disease prediction

  • 1. HYBRID SVM-LR CLASSIFIER FOR POWDERY MILDEW DISEASE PREDICTION IN TOMATO PLANT PAPER ID-129 ANSHUL BHATIA RESEARCH SCHOLAR, USIC&T GURU GOBIND SINGH INDRAPRASTHA UNIVERSITY DWARKA SECTOR-16C NEW DELHI (110078), INDIA 7th International Conference on Signal Processing and Integrated Networks (SPIN 2020) 27 - 28 February 2020, Amity University, Noida, India
  • 2. CONTENTS • Introduction • Database Used • Random Over (RO) Sampling • Adaptive Sampling based noise reduction (ANR) method • Support Vector Machine • Logistic Regression • Proposed Method • Experimental Results • Conclusion and Future Direction • References 2 7th International Conference on Signal Processing and Integrated Networks (SPIN 2020) 27 - 28 February 2020, Amity University, Noida, India
  • 3. INTRODUCTION 3 7th International Conference on Signal Processing and Integrated Networks (SPIN 2020) 27 - 28 February 2020, Amity University, Noida, India • Powdery mildew is a contagious disease caused by fungus named LeveillulaTaurica, which can severely affect the quality and productivity of tomato crop. • So, the detection and treatment of powdery mildew disease in tomatoes is very crucial because it can adversely affect the yield of tomato plants. • Machine learning based classification algorithms can be used for developing forecasting model for plant disease prediction. • A hybrid SVM-LR classifier has been proposed in current study for detection of powdery mildew disease. • Hybrid SVM-LR is implemented here to get more accurate results for tomato powdery mildew prediction as compared to previous study.
  • 4. DATABASE USED 4 7th International Conference on Signal Processing and Integrated Networks (SPIN 2020) 27 - 28 February 2020, Amity University, Noida, India Tomato Powdery Mildew Disease (TPMD) Dataset • Binary-class imbalanced dataset • Includes statistics about severity of powdery mildew disease based on weather conditions • The overall dataset contains 244 data points upon 5 unique features • Independent Variables: GR (watt/m2), LW (%), WS (KM/h), RH (%), and T (ₒc) • Dependent Variables: Day Prediction (DP) (conducive or non-conducive)
  • 5. RANDOM OVER (RO) SAMPLING 5 7th International Conference on Signal Processing and Integrated Networks (SPIN 2020) 27 - 28 February 2020, Amity University, Noida, India • Non-heuristic resampling technique • Widely used for balancing imbalanced datasets • Balances imbalanced dataset by randomly copying the existing samples of minor classes for increasing the number of data points in the train-set in order to balance it with major classes • Following table shows the distribution of classes before and after RO sampling for TPMD dataset: TPMD dataset Class Before RO sampling After RO sampling Conducive 27 217 Non-Conducive 217 217 (No. of samples) 244 434
  • 6. ADAPTIVE SAMPLING BASED NOISE REDUCTION (ANR) METHOD 6 7th International Conference on Signal Processing and Integrated Networks (SPIN 2020) 27 - 28 February 2020, Amity University, Noida, India • Deal with noisy class labeled data • Acts as a wrapper for various classifiers, such as LR, k-Nearest Neighbor (kNN), SVM, weighted kNN, and LDA • Provides a noise-minimized train set by iteratively calculating the probability of class mislabeling by using adaptive sampling technique • Improved train set obtained from this model can reduce the risk of choosing mislabeled samples for training of model • Hence, a precise and generalized model can be obtained
  • 7. ADAPTIVE SAMPLING BASED NOISE REDUCTION (ANR) METHOD (CONT...) 7 7th International Conference on Signal Processing and Integrated Networks (SPIN 2020) 27 - 28 February 2020, Amity University, Noida, India • In this study, ANR method has been used with SVM classifier for the reduction of noise labels (misclassified labels) from the train set obtained from TPMD dataset • ANR method with SVM classifier provides the probability of conducive and non-conducive labels based on the independent weather parameters. • These probabilities have further been used for developing the noise-minimized train set. • Following table shows a sample of train set with the probability value of class labels: Weather-Parameters (Independent Variables) Probabilities of class labels T RH LW WS GR P N 24.8 92 35 1 34 0.998 0.001 21.4 82 30 3 32 0.001 0.998 25.1 83 29 1 41 0.987 0.012 24.3 65 16 2 40 0.001 0.998 30.1 67 34 2 56 0.000 0.999
  • 8. ADAPTIVE SAMPLING BASED NOISE REDUCTION (ANR) METHOD (CONT...) 8 7th International Conference on Signal Processing and Integrated Networks (SPIN 2020) 27 - 28 February 2020, Amity University, Noida, India • In this study, ANR method has been used with SVM classifier for the reduction of noise labels (misclassified labels) from the train set obtained from TPMD dataset • ANR method with SVM classifier provides the probability of conducive and non-conducive labels based on the independent weather parameters. • These probabilities have further been used for developing the noise-minimized train set. • Following table shows a sample of train set with the probability value of class labels: In this table: P: Probability of positive class i.e. conducive class N: Probability of negative class i.e. non-conducive class. Weather-Parameters (Independent Variables) Probabilities of class labels T R H L W W S GR P N 24.8 92 35 1 34 0.998 0.001 21.4 82 30 3 32 0.001 0.998 25.1 83 29 1 41 0.987 0.012 24.3 65 16 2 40 0.001 0.998 30.1 67 34 2 56 0.000 0.999
  • 9. ADAPTIVE SAMPLING BASED NOISE REDUCTION (ANR) METHOD (CONT...) 9 7th International Conference on Signal Processing and Integrated Networks (SPIN 2020) 27 - 28 February 2020, Amity University, Noida, India • Based on these probabilities, a modified train-set has been developed using predicted class adjustment criteria as shown in following table: • Based on above mentioned criteria, a noise-minimized dataset has been developed. A sample of noise-minimized dataset is shown in following table: Probabilities Comparison Adjusted Class P>N 1 (Conducive) N>P 0 (Non-Conducive) Weather-Parameters (Independent Variables) Class (Dependent Variable) T RH LW WS GR 24.8 92 35 1 34 1 21.4 82 30 3 32 0 25.1 83 29 1 41 1 24.3 65 16 2 40 0 30.1 67 34 2 56 0
  • 10. SUPPORT VECTOR MACHINE (SVM) 10 7th International Conference on Signal Processing and Integrated Networks (SPIN 2020) 27 - 28 February 2020, Amity University, Noida, India • Widely used supervised machine learning algorithm • A point is plotted for each sample present in the dataset in an m-dimensional space, where m represents number of attributes available in the dataset • Each coordinate present in space indicates a particular attribute. • SVM algorithm basically identifies the best hyper-plane that divides the two labeled classes accurately • The hyper-plane with the highest marginal difference is considered as the best hyper-plane. x Y A B C
  • 11. LOGISTIC REGRESSION (LR) 11 7th International Conference on Signal Processing and Integrated Networks (SPIN 2020) 27 - 28 February 2020, Amity University, Noida, India • Supervised Machine Learning Algorithm • Shows relationship between categorical dependent variable and a set of predicted (independent) variables. • Prediction obtained from LR model provides the probabilities of successful and unsuccessful events for the collection of independent variables. • If Class is a response variable and T, RH, LW, GR, and WS are predicted variables then the equation of LR can be written as follows (Equation 1): 𝑙𝑛 𝑝 𝐶𝑙𝑎𝑠𝑠 1−𝑝 𝐶𝑙𝑎𝑠𝑠 = 𝛽0 + 𝛽1𝑇 + 𝛽2𝑅𝐻 + 𝛽3𝐿𝑊 + 𝛽4𝐺𝑅 + 𝛽5𝑊𝑆 (1)
  • 12. LOGISTIC REGRESSION (LR) (CONT.…) 12 7th International Conference on Signal Processing and Integrated Networks (SPIN 2020) 27 - 28 February 2020, Amity University, Noida, India • In Equation 1: p (Class)/1-p (Class) = Ratio of probability of success to failure β0 to β5 : Regression coefficients Class: Response Variable (0 and 1) • Regression coefficients can be calculated using a popular approach known as Maximum Likelihood Estimation. On taking inverse of Equation 1, we get: 𝑝 𝐶𝑙𝑎𝑠𝑠 = 𝑒𝛽0+𝛽1𝑇+𝛽2𝑅𝐻+𝛽3𝐿𝑊+𝛽4𝐿𝑅+𝛽5𝑊𝑆 1+𝑒𝛽0+𝛽1𝑇+𝛽2𝑅𝐻+𝛽3𝐿𝑊+𝛽4𝐿𝑅+𝛽5𝑊𝑆 (2) • The above gives the value of probabilities within the range of 0 and 1. • If the value of p comes out to be greater than 0.5 then the value of response variable Class is 1 otherwise it is 0.
  • 13. PROPOSED METHOD 13 7th International Conference on Signal Processing and Integrated Networks (SPIN 2020) 27 - 28 February 2020, Amity University, Noida, India Application of RO sampling Train set (70%) SVM classifier ANR method TPMD data set (Imbalanced) Noise Reduction Data Cleaning Modified train set 10-fold cross validation LR classifier Prediction Model Test set (30%) Performance Evaluation (Accuracy, AUC, and F1-score) TPMD dataset (Balanced)
  • 14. EXPERIMENTAL RESULTS 14 7th International Conference on Signal Processing and Integrated Networks (SPIN 2020) 27 - 28 February 2020, Amity University, Noida, India Classifier Performance Metrics Accuracy AUC F1-score LR 87.02% 0.8777 0.8722 SVM 89.31% 0.8988 0.8923 SVM-LR 92.37% 0.9270 0.9264 0.84 0.85 0.86 0.87 0.88 0.89 0.9 0.91 0.92 0.93 0.94 LR SVM SVM-LR Performance Metrics Classifiers Performance of SVM, LR and SVM-LR classifier Accuracy AUC F1-score
  • 15. CONCLUSION AND FUTURE DIRECTION 15 7th International Conference on Signal Processing and Integrated Networks (SPIN 2020) 27 - 28 February 2020, Amity University, Noida, India • This study discusses a Hybrid SVM-LR approach for better prediction of powdery mildew disease in tomato plants. • The proposed approach has effectively been implemented on TPMD dataset showing superiority in predicting powdery mildew disease over SVM and LR classifiers in terms of accuracy, AUC and F1-score metrics. • Since current work did not use any feature selection algorithm to identify the most important features for the detection of powdery mildew disease in tomato plant. • This work can further be extended by using feature selection techniques to further improve the performance of prediction models. • Various meta-heuristic and optimization algorithms can also be used for better results.
  • 16. REFERENCES 16 7th International Conference on Signal Processing and Integrated Networks (SPIN 2020) 27 - 28 February 2020, Amity University, Noida, India • W. B. Jones, S. V Thomson, and others, “Source of inoculum, yield, and quality of tomato as affected by Leveillula taurica.,” Plant Dis., vol. 71, no. 3, pp. 266–268, 1987. • U. Braun and others, “A monograph of the Erysiphales (powdery mildews).,” Beihefte zur Nov. Hedwigia, no. 89, 1987. • A. R. T. Bakeer, M. A. E. Abdel-Latef, M. A. Afifi, and M. E. Barakat, “Validation of Tomato Powdery Mildew Forecasting Model using Meteorological Data in Egypt,” Int. J. Agric. Sci., vol. 5, no. 2, p. 372, 2013. • R. A. Guzman-Plazola, Development of a Spray Forecast Model for Tomato Powdery Mildew (Leveillula Taurica (Lev). Arn.). University of California, Davis, 1997. • A. Fuentes, S. Yoon, S. Kim, and D. Park, “A robust deep-learning-based detector for real-time tomato plant diseases and pests recognition,” Sensors, vol. 17, no. 9, p. 2022, 2017. • U. Mokhtar, M. A. S. Ali, A. E. Hassenian, and H. Hefny, “Tomato leaves diseases detection approach based on support vector machines,” in 2015 11th International Computer Engineering Conference (ICENCO), 2015, pp. 246–250. • R. Ghaffari et al., “Early detection of diseases in tomato crops: An electronic nose and intelligent systems approach,” in The 2010 International Joint Conference on Neural Networks (IJCNN), 2010, pp. 1–6. • S. Verma, A. Bhatia, A. Chug, and A. P. Singh, “Recent Advancements in Multimedia Big Data Computing for IoT Applications in Precision Agriculture: Opportunities, Issues, and Challenges,” in Multimedia Big Data Computing for IoT Applications, Springer, 2020, pp. 391– 416. • S. Verma, A. Chug, and A. P. Singh, “Prediction Models for Identification and Diagnosis of Tomato Plant Diseases,” in 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 2018, pp. 1557–1563. • S. Verma, A. Chug, A. P. Singh, S. Sharma, and P. Rajvanshi, “Deep Learning-Based Mobile Application for Plant Disease Diagnosis: A Proof of Concept With a Case Study on Tomato Plant,” in Applications of Image Processing and Soft Computing Systems in Agriculture, IGI Global, 2019, pp. 242–271. • T. Rumpf, A.-K. Mahlein, U. Steiner, E.-C. Oerke, H.-W. Dehne, and L. Plümer, “Early detection and classification of plant diseases with support vector machines based on hyperspectral reflectance,” Comput. Electron. Agric., vol. 74, no. 1, pp. 91–99, 2010. • G. Prince, J. P. Clarkson, N. M. Rajpoot, and others, “Automatic detection of diseased tomato plants using thermal and stereo visible light images,” PLoS One, vol. 10, no. 4, p. e0123262, 2015. • M. McGrath, “Powdery mildew on tomatoes.” [Online]. Available:http://blogs.cornell.edu/livegpath/gallery/tomato/powdery-mildew-on-tomatoes/. • G. E. Batista, R. C. Prati, and M. C. Monard, “A study of the behavior of several methods for balancing machine learning training data,” ACM SIGKDD Explor. Newsl., vol. 6, no. 1, pp. 20–29, 2004. • P. Yang, J. T. Ormerod, W. Liu, C. Ma, A. Y. Zomaya, and J. Y. H. Yang, “AdaSampling for positive-unlabeled and label noise learning with bioinformatics applications,” IEEE Trans. Cybern., vol. 49, no. 5, pp. 1932–1943, 2018. • J. A. K. Suykens and J. Vandewalle, “Least squares support vector machine classifiers,” Neural Process. Lett., vol. 9, no. 3, pp. 293–300, 1999.
  • 17. THANK YOU 17 7th International Conference on Signal Processing and Integrated Networks (SPIN 2020) 27 - 28 February 2020, Amity University, Noida, India