SlideShare a Scribd company logo
1 of 56
Master Seminar-I
Application of Machine Learning in Agriculture
Aman Vasisht
PGS20AGR8404
Dept. of Agricultural Statistics
UNIVERSITY OF AGRICULTURAL SCIENCES,
DHARWAD
COLLEGE OF AGRICULTURE, DHARWAD
OUTLINE
MACHINE LEARNING AND ITS
APPLICATIONS
TYPES OF MACHINE LEARNING
ALGORITHMS
CASE STUDIES
CONCLUSION
REFERENCES
QUICK QUESTIONNAIRE
How many of you have heard about Machine Learning ?
How many of you know about Machine Learning ?
How many of you are using Machine Learning ?
What is Machine Learning ?
• It is the science of programming computers so they can learn
from data.
• A type of AI that allows applications to become more accurate
in predicting outcomes.
Artificial Intelligence
Machine Learning
Deep Learning
Data Science
AI : Programs with the ability to learn
and reason like humans
ML : Algorithms with the ability to
learn and make informer decisions
DL : Artificial neural networks adapt and
learn from vast amounts of data
APPLICATIONS IN AGRICULTURE :
• Yield Prediction : An accurate model can help farm owners to take
informed management decisions for their farm.
• Disease Detection : Use of algorithms can help to identify diseased
plants with good accuracy.
• Crop quality : Grading of commodities can be done using some
parameters.
• Livestock Management : Managing farms according to the
controlled conditions and parameters.
TERMINOLOGY :
• Features : The number of features or distinct traits that can be used to
describe a label in a quantitative manner.
• Label or Target : The final outcome or variable which is dependent
on the contribution of the features.
• Training : To train the algorithm with dataset.
• Testing : To check accuracy of predicted values.
Let’s understand in a better way
Training :
Apple
Features :
1. Color : Reddish
2. Type : Fruit
3. Shape
etc..
Features :
1. Color : Greyish
2. Type : Company Logo
3. Shape
etc..
Features :
1. Color : Yellowish
2. Type : Fruit
3. Shape
etc..
Feature
extraction
Label
Machine
Learning
algorithm
Input Features
Feature
extraction
Classifier
model
Label
Input
Features
a) Training :
b) Testing :
TYPES OF MACHINE LEARNING :
 Supervised Learning: - We are able to predict future outcomes
based on past data. It requires both features and labels to be given to
the model for it to be trained.
 Unsupervised Learning: - We are able to identify hidden patterns
from the input data provided. By making the data more readable and
organized, the patterns, similarities, or anomalies become more
evident.
SUPERVISED LEARNING :
• Let feature variables be ‘X’ and output or label variable be ‘Y’. you use an
algorithm to learn the mapping function from the input to the output.
Y = f(X)
The goal is to approximate the mapping function so well that when you have new
input data (X) that you can predict the output variables (Y) for that data.
Classification : A classification problem is when the output variable is a category,
such as :
Effective or non-effective
Disease or no disease, etc.
MAJOR ALGORITHMS
Classification.
• KNN
• SVM
• Logistic Regression
• Decision Tree Classifier
• Naive bayes
Regression : A regression problem is
when the output variable is a real
value, such as “yield” or “weight”. The
algorithms under this category are :
• Linear Regression
• Multiple Regression
• Polynomial Regression
• Lasso Regression
• Ridge Regression
UNSUPERVISED LEARNING :
• Unsupervised learning is where you only have input data (X) and no corresponding
output variables.
• The goal for unsupervised learning is to model the underlying structure or
distribution in the data in order to learn more about the data.
• Clustering: A clustering problem is where you want to discover the inherent
groupings in the data, such as grouping customers by purchasing behavior.
Algorithms : DBSCAN
K Means clustering
Hierarchical clustering
Input data
These are known fruits
Model It’s an apple
Prediction
Input data
Model
Unsupervised Learning
Supervised Learning
GOOD OR BAD MACHINE LEARNING MODEL :
• The main goal of each machine learning model is to generalize well.
• Here generalization defines the ability of an ML model to provide training on
the dataset, which can produce reliable and accurate output.
• Underfitting and overfitting are the two terms that need to be checked for the
performance of the model and whether the model is generalizing well or not.
Before understanding overfitting and underfitting, let's understand some basic
terms :
Bias
Variance
Bias-Variance Tradeoff :
Y = 𝑓(X) + ϵ [Let Y be dependent variable and X be independent
variable] ϵ∼N(0,σϵ).
We may estimate a model 𝑓(X) of 𝑓(X) using regression,
Therefore the error,
Err(x)=E[(Y− 𝑓(X))
2
]
This error may then be decomposed into bias
and variance components:
Err(X) = (E[𝑓(X)]− 𝑓(X))
2
+ E[(𝑓(X)−E[𝑓(X)])
2
] + σ
2
e
Err(X) = Bias
2
+ Variance + Irreducible Error
Low Variance High Variance
Low
Bias
High
Bias
SIMPLE REGRESSION :
• Linear regression is one of the easiest and most popular Machine Learning
algorithms that is used for predictive analysis.
• y= a0+a1x+ ε
y= Dependent variable (target variable)
x= Independent variable (predictor variable)
a0= Intercept of the line
a1 = Linear regression coefficient
We wish to find a0 and a1 such that 𝛴(𝑦𝑖 − (𝑎0 + 𝑎1𝑥))2 is minimum.
a0 = 𝑦 - a1𝑥 and a1 =
𝛴(𝑥i − 𝑥)(𝑦i − 𝑦 )
𝛴 𝑥𝑖
− 𝑥 2
0
2
4
6
8
10
12
14
16
18
20
0 5 10 15 20 25
Y
X
Scatter plot
y = 0.7019x + 2.4094
R² = 0.8363
0
2
4
6
8
10
12
14
16
18
20
0 5 10 15 20 25
Y
X
Best fit line
Imagine if we add a couple of large
values in the data, will it affect the
regression line?
Let’s check it
y = 1.5952x - 4.3564
R² = 0.4629
-10
0
10
20
30
40
50
60
70
0 5 10 15 20 25
Y
X
Best fit line Outlier
Detection of outliers in Machine Learning model:
• Using Z score :
• Z score helps to understand if a data value is greater or smaller than mean
and how many standard deviations away it is from the mean.
• 𝑍 =
𝑥−𝑥
𝜎
• Values above and below 𝑥 ± 3𝜎 are considered outliers.
Q. What is the most appropriate
measure of central tendency when the
data has outliers?
The median is usually preferred in these
situations because the value of the mean
can be distorted by the outliers.
• Inter-Quartile Range (IQR) proximity rule :
The data points which fall
below Q1 – 1.5 IQR or
above Q3 + 1.5 IQR are outliers.
Box plot diagram also termed as Whisker’s plot
is a graphical method.
The very purpose of this diagram is to identify
outliers and discard it from the data series.
Crop production
Crop area
ASSUMPTIONS OF REGRESSION ANALYSIS IN ML :
• Linear and additive
• No auto correlation
• No multicollinearity
• Homoscedasticity
• Normal distribution of errors
These assumptions are violated a lot and this violation if overlooked by a
researcher, can make the model bad and not good for predictions.
REGULARIZATION :
• Regularization is an important concept that is used to avoid overfitting of the data,
especially when the trained and test data are much varying.
• Two types :
L2 Ridge regression
L1 Lasso regression
L2 Ridge regression :
• It performs ‘L2 regularization’, i.e. adds penalty equivalent to square of the
magnitude of coefficients. Thus, it optimizes the following:
Objective = RSS + 𝜆* (sum of square of coefficients)
Loss Penalty
• 𝜆 is the tuning parameter which balances the
amount of emphasis given to minimizing RSS vs
minimizing sum of square of coefficients.
• In majority of cases, it is used to prevent
overfitting.
• It is mostly used to prevent multicollinearity.
• It reduces the model complexity by coefficient shrinkage.
L1 Lasso regression :
LASSO stands for Least Absolute Shrinkage and Selection Operator..
• Lasso regression performs L1 regularization, i.e. it adds a factor of sum of
absolute value of coefficients in the optimization objective.
• Objective = RSS + 𝜆* (sum of absolute value of coefficients)
It is generally used
when we have
more number of
features, because it
automatically does
feature selection
which makes it
better from ridge
regression.
Constraint region
RSS as it moves
away from
minimum
CLASSIFICATION :
• A common job of machine learning algorithms is to recognize objects and being
able to separate them into categories.
KNN (K-Nearest Neighbor) algorithm:
• K-NN is a non-parametric algorithm.
• It is also called a lazy learner algorithm.
• KNN algorithm at the training phase just stores the dataset and when it gets new
data, then it classifies that data into a category based on some distance measures.
• One of these measures is Minkowski distance.
c : a parameter
p,q are two points
𝑖=1
𝑛
𝑝𝑖 − 𝑞𝑖 2
Euclidean distance :
When c = 2
𝑖=1
𝑛
|𝑝𝑖 − 𝑞𝑖|
Manhattan distance :
When c = 1
P
Q
0
2
4
6
8
10
12
14
16
18
0 5 10 15 20 25 30 35 40 45
• K Number of Neighbors are
generally taken as odd : 3, 5.. etc.
• Very simple
• Works with any number of classes
• Re-scaling is very important as it
is a distance-based algorithm.
K = 5
Accuracy :
Predicted Value
Actual
Value
n 0 1
0 TN FP
1 FN TP
Let’s see
an example
TN : True Negative
FP : False Positive
FN : False Negative
TP : True Positive
Accuracy :
Predicted Value
Actual
Value
n = 150 Healthy unhealthy
Healthy 40 10
Unhealthy 5 95
Accuracy = Correctly predicted / TN +
FP + FN + TP
Error rate = Wrong predicted / TN + FP
+ FN + TP
Therefore,
Accuracy = (40 + 95)/(40 + 10 + 5 + 95)
= 0.9 or 90%
Error rate = 15/150 = 0.1
But how do we know which number
to take as K ?
Is it 5, 7 or any other number?
Support Vector Machine (SVM) :
• In the SVM algorithm, we plot each data item as a point in n-dimensional
space (where n is a number of features you have) with the value of each
feature being the value of a particular coordinate.
• The goal is to find decision boundary that is
separating the classes.
Two types :
• Linear SVM : if a dataset can be classified
into two classes by using a single straight
line.
• Non-linear SVM : a dataset cannot be
classified by using a straight line -3
2
7
12
17
22
-3 2 7 12 17 22
-3
2
7
12
17
22
-3 2 7 12 17 22
Maximum Margin
Max. Margin
Hyperplane
Support vectors
Terminology :
Hyperplane : The best decision line or boundary.
Support vectors : the closest point of the lines
from both the classes.
Margin : The distance between the vectors and
the hyperplane. It should be maximum.
Kernel : Kernel Function generally transforms the
non-linear data into linear separable data.
To transform the non-linear data :
• Y = x2 (for 1D non-linear data)
By adding this dimension, we will get two-dimensional space.
• Z = x2 + y2 (for 2D non-linear data)
By adding this dimension, we will get three-dimensional space.
Radial Basis Function (RBF) :
• It computes the similarity or how close points x1 and x2 are to each other.
𝑘(𝑥1, 𝑥2) = ⅇ𝑥𝑝 −
| 𝑥1 − 𝑥2 |2
2𝜎2
UNSUPERVISED MACHINE LEARNING :
• Unsupervised learning is the training of a machine using information that is neither
classified nor labeled.
• It groups unsorted information according to similarities, patterns, and differences without
any prior training of data.
Hierarchical Clustering:
It involves creating clusters in a predefined order when similar clusters are grouped together
and are arranged in a hierarchical manner.
Non Hierarchical Clustering :
It involves formation of new clusters by merging or splitting the clusters. It does not follow a
tree like structure like hierarchical clustering. K means clustering and DBSCAN are two
effective algorithms.
3 clusters formed when the data is
uniform i.e. when data is easily
separable with naked eye. What if the
data is non-uniform?
Clusters
DBSCAN (DENSITY-BASED SPATIAL CLUSTERING OF
APPLICATIONS WITH NOISE) :
• The DBSCAN algorithm has a key idea that for each point of a cluster, the
neighborhood of a given radius has to contain at least a minimum number of points.
• DBSCAN algorithm requires two parameters:
Min_pts: The minimum number of points (a threshold) clustered together for a region
to be considered dense.
Eps (ε): A distance measure that will be used to locate the points in the neighborhood
of any point.
• In this algorithm, we have 3 types of data points.
Core Point: A point if it has more than MinPts needed within eps.
Border Point: A point which has fewer than MinPts within eps but
it is in the neighborhood of a core point.
Noise or outlier: A point which is not a core point or border point.
Noise
Min_pts : 4
Red : Core points
Green : Border points that are still part of
cluster because they are within epsilon of a
core point, but do not meet the min_points
criteria.
Blue : Noise point, not assigned to cluster.
Important points :
• Other clusters are suitable only for compact and well separated clusters. In non-
uniform data, DBSCAN is much better.
• It is robust to outliers.
• It takes count of dense regions and accordingly makes clusters and lower density
points are not taken care of.
• Minimum points we should take are 3.
• DBSCAN uses Euclidean distance by default. 𝑖=1
𝑛
𝑝𝑖 − 𝑞𝑖 2
IMAGE FEATURE EXTRACTION :
Texture extraction
• Number of different intensity levels
in the image. This identifies the
size of a GLCM.
• Find intermediate matrix A by
finding how frequently a pixel p
occurs in a particular spatial
relationship with pixel q.
• Calculate GLCM by dividing each
element of matrix A by the sum of
elements of matrix A.
Color extraction
• Extract three components red,
green and blue from image.
• Convert color image to HSV
image.
• Extract hue, saturation and
intensity of image.
• For each component extracted,
compute mean, variance and
range.
Grey Level Co-occurrence matrix (GLCM) is used to find:
4 4 1
2 0 2
1 1 3
0 1 2
0
1
2
C =
1
18
* C
0.22 0.22 0.06
0.11 0 0.11
0.06 0.06 0.16
0 1 2
0
1
2
CASE STUDY-1 :
CLASSIFICATION OF GRAPE LEAVES USING KNN
AND SVM CLASSIFIERS
Anil A. Bharate, M. S. Shirdhonkar (2020)
DATA SOURCE:
• This case study proposes a technique to classify the grape leaf as healthy and unhealthy.
• Database consisted of 90 images of grape leaves.
Training : 30 images of healthy and 30 images of unhealthy leaves.
Testing : 30 images including healthy and unhealthy leaves.
• Feature extraction (Image processing) :
Texture and color features are extracted using Grey Level Co-occurrence Matrix
(GLCM).
a) Healthy
leaf
b) Unhealthy
leaf
RESULTS :
Parameter : Proposed method (SVM) Proposed method (KNN)
Features 4 texture & 18 color 4 texture & 18 color
Classifier SVM kNN
Number of samples 30 30
90
96.66
SVM KNN
Accuracy
(%)
Comparison of Results
It is noticed that accuracy of KNN is
better than SVM model. This is because
KNN computes distance to all neighbors
from a point, then finds nearest neighbor
and then decides about the class. On the
other hand, SVM considers only support
vectors to find hyper plane and then
decides about the class.
CONCLUSIONS OF CASE STUDY :
• Automation will be a boon for farmers to prevent their plants from diseases and increase
the yield.
• The KNN classifier gives better accuracy than SVM classifier.
• As a future work system can be trained to identify the diseases present on the grape leaves
and also provide the possible solution.
• Automatic image capturing camera can be installed with the help of government bodies
and thus the images captured can be sent for feature selection and then tested and trained
with some algorithms, concluding best algorithm with best accuracy for future
identification of scalability of infected leaves.
CASE STUDY-2 :
CROPAND FERTILIZER RECOMMENDATION
SYSTEM BASED ON SOIL CLASSIFICATION
Akshatha et al. (2022)
DATA SOURCE :
• The case study mainly focuses on classifying the soil records gathered from GKVK UAS,
Bangalore, Karnataka.
• It includes samples from various taluk of Chikkamagaluru district like Tarikere, Kadur,
Sringeri and Koppa.
Soil samples : 1550 (Training – 70%, Testing – 30%)
Attributes : N, P, K, Ca, Mg, Lime, C, S and moisture.
Algorithm used : SVM, KNN
Classification of soil nutrition into 4 classes Crops suggested
Class 0 (low fertile) Beans, green peas, carrot, onion
Class 1 (moderately fertile) Radish, cowpea, cabbage, cauliflower
Class 2 (high fertile) Sugarcane, paddy, bajra, guava
Class 3 (very high fertile) Barley, cotton, tobacco, sunflower
Results :
Ca Mg K S N Lime C P Moistur
e
Class
9.653 6.585 142 108 226.05 5.83 1.29 18 0.9 1
19.88 22.2 339.35 77 308.25 6.45 2 298 0.8 2
2.931 41.22 514.29 108 277.42 6.43 0.74 48 0.6 1
True
class
Predicted class
Confusion matrix for SVM model
Correctly
classified
Incorrectly
classified
Total
testing
data
Accuracy of
SVM
845 240 1085 77.85%
Class-0 labels (1st row) :
58% predicted same
27% misclassified as class-1
1.1% misclassified as class-2
14% misclassified as class-3
CONCLUSIONS OF CASE STUDY :
• KNN algorithm was also used which gave less accuracy of 72.04%.
• SVM algorithm obtained higher accuracy as it captured non-linearity in data.
• Based on the classification of soil class, crops can be recommended.
• This can help farmers to grow the best-suited crop that is adaptable to their soil
condition.
• The model can be improved with more hyper parametric tuning which can help
increase accuracy of the model and ultimately help farmers get to know about their
farm soil fertility level and crop suggested based on the fertility levels.
CASE STUDY-3 :
Sugar Cane Crop Yield Estimation Using K-Nearest
Neighbors
Kumar et al.
• The dataset includes predictors : Rainfall, pH, Organic Carbon, Area,
S, Cu, Fe, P, Mn, N, Fibre.
• Dependent variable : Yield (tons)
• Crop considered : Sugarcane
• State : Telangana
• Period : 1901 to 2016 annual data.
• Data re-scaled before analysis
DATA SOURCE :
RESULTS :
• Accurate yield predictions across different areas can help the
farmers get better profit from the crops.
• KNN can be an alternative approach for regression as usually it is
used mostly for classification problems.
• In future we can make predictions using different algorithms and
compare the accuracies to chose best among them.
CONCLUSIONS OF CASE STUDY :
CASE STUDY-4 :
MULTIVARIATE WEATHER ANOMALY DETECTION
USING DBSCAN CLUSTERING ALGORITHM
Wibisono et al.
• Dataset : 8 attributes used from daily weather data.
• Place : Semarang city, Indonesia.
• Algorithm used :
DBSCAN & PCA
DATA SOURCE :
Attribute Data type
Min. temperature Numerical
Max. temperature Numerical
Average temperature Numerical
Average Humidity (%) Numerical
Sun exposure time (hours) Numerical
Maximum wind speed (m/s) Numerical
Average wind speed (m/s) Numerical
Rainfall (mm) Numerical
RESULTS :
0.19 eps
PC1 mainly consisted of : Avg. temperature, Max.
temperature and Avg. humidity.
PC2 mainly consisted of Tn temperature.
• The result showed that anomalous weather is characterized by high
humidity and low temperature.
• The experimental result had demonstrated that DBSCAN is capable of
identifying peculiar data points that are deviating from the ‘normal’
data distribution.
• The anomalous weather was characterized by high humidity and low
temperature.
• PCA can be utilized with DBSCAN in detection of noise.
CONCLUSIONS :
CONCLUSIONS OF MACHINE LEARNING :
• No algorithm is appropriate for all situations.
• Choosing a technique depends on pattern, type of data and experience of the
analyst.
• Using ML algorithms as a pipeline can save time of the analyst and give fast
solutions to the farmer.
• There is a wide scope of application of ML in agriculture, especially in plant
disease classification, soil or crop classification and prediction of yield of
crops.
• Automation can help reduce biotic and abiotic stress in fields that is
prevailing in the country.
REFERENCES :
• Akshatha, G.C. and Shastry, K.A., 2022. Crop and fertilizer recommendation system
based on soil classification, Recent Advances in Artificial Intelligence and Data
Engineering (pp. 29-40).
• Bharate, A.A. and Shirdhonkar, M.S., 2020. Classification of grape leaves using KNN
and SVM classifiers, 2020 Fourth International Conference on Computing
Methodologies and Communication (ICCMC) (pp. 745-749).
• Naveen N. Kumar, Balakrishnan, M., 2018. Sugar cane crop yield estimation using K-
Nearest Neighbors, Journal of Advance Research in Dynamical and Control Systems,
10(4), (pp. 199-207).
• Wibisono, S., Anwar, M.T., Supriyanto, A. and Amin, I.H.A., 2021. Multivariate weather
anomaly detection using DBSCAN clustering algorithm, Journal of Physics: Conference
Series (Vol. 1869, No. 1, p. 012077).
Application of Machine  Learning in Agriculture

More Related Content

What's hot

AI in Agriculture ppt
AI in Agriculture pptAI in Agriculture ppt
AI in Agriculture pptRADO7900
 
Artificial Intelligence In Agriculture & Its Status in India
Artificial Intelligence In Agriculture & Its Status in IndiaArtificial Intelligence In Agriculture & Its Status in India
Artificial Intelligence In Agriculture & Its Status in IndiaJanhviTripathi
 
Precision farming rohit pandey
Precision farming rohit pandeyPrecision farming rohit pandey
Precision farming rohit pandeyGovardhan Lodha
 
Smart agriculture system
Smart agriculture systemSmart agriculture system
Smart agriculture systemAyushGupta743
 
role of Geospatial technology in agriculture
role of Geospatial technology  in agriculturerole of Geospatial technology  in agriculture
role of Geospatial technology in agricultureDr. MADHO SINGH
 
Crop predction ppt using ANN
Crop predction ppt using ANNCrop predction ppt using ANN
Crop predction ppt using ANNAstha Jain
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)SwatiTripathi44
 
ICAR initiatives on Application of Artificial Intelligence and Internet of Th...
ICAR initiatives on Application of Artificial Intelligence and Internet of Th...ICAR initiatives on Application of Artificial Intelligence and Internet of Th...
ICAR initiatives on Application of Artificial Intelligence and Internet of Th...Sudhir Kumar Soam
 
Artificial intelligence in plant disease detection
Artificial intelligence in plant disease detectionArtificial intelligence in plant disease detection
Artificial intelligence in plant disease detectionGoliBhaskarSaiManika
 
Application of GIS in agriculture
Application of GIS in agricultureApplication of GIS in agriculture
Application of GIS in agricultureNishat Fatima
 
prospects of artificial intelligence in ag
prospects of artificial intelligence in agprospects of artificial intelligence in ag
prospects of artificial intelligence in agVikash Kumar
 
Indian agriculture: Mechanization to Digitization
Indian agriculture: Mechanization to DigitizationIndian agriculture: Mechanization to Digitization
Indian agriculture: Mechanization to DigitizationICRISAT
 
precision farming.ppt
precision farming.pptprecision farming.ppt
precision farming.pptG BHARGAVI
 
Agriculture development with computer science and engg.ppt
Agriculture development with computer science and engg.pptAgriculture development with computer science and engg.ppt
Agriculture development with computer science and engg.pptBikash Kumar
 
Weed management using remote sensing
Weed management using remote sensingWeed management using remote sensing
Weed management using remote sensingveerendra manduri
 

What's hot (20)

AI in Agriculture ppt
AI in Agriculture pptAI in Agriculture ppt
AI in Agriculture ppt
 
Artificial Intelligence In Agriculture & Its Status in India
Artificial Intelligence In Agriculture & Its Status in IndiaArtificial Intelligence In Agriculture & Its Status in India
Artificial Intelligence In Agriculture & Its Status in India
 
Precision farming rohit pandey
Precision farming rohit pandeyPrecision farming rohit pandey
Precision farming rohit pandey
 
Precision farming 1
Precision farming 1Precision farming 1
Precision farming 1
 
Precision Agriculture
Precision AgriculturePrecision Agriculture
Precision Agriculture
 
Smart agriculture system
Smart agriculture systemSmart agriculture system
Smart agriculture system
 
role of Geospatial technology in agriculture
role of Geospatial technology  in agriculturerole of Geospatial technology  in agriculture
role of Geospatial technology in agriculture
 
Crop predction ppt using ANN
Crop predction ppt using ANNCrop predction ppt using ANN
Crop predction ppt using ANN
 
Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)Introduction to ML (Machine Learning)
Introduction to ML (Machine Learning)
 
Precision farming
Precision farmingPrecision farming
Precision farming
 
ICAR initiatives on Application of Artificial Intelligence and Internet of Th...
ICAR initiatives on Application of Artificial Intelligence and Internet of Th...ICAR initiatives on Application of Artificial Intelligence and Internet of Th...
ICAR initiatives on Application of Artificial Intelligence and Internet of Th...
 
PRECISION AGRICULTURE
PRECISION AGRICULTUREPRECISION AGRICULTURE
PRECISION AGRICULTURE
 
Artificial intelligence in plant disease detection
Artificial intelligence in plant disease detectionArtificial intelligence in plant disease detection
Artificial intelligence in plant disease detection
 
Application of GIS in agriculture
Application of GIS in agricultureApplication of GIS in agriculture
Application of GIS in agriculture
 
prospects of artificial intelligence in ag
prospects of artificial intelligence in agprospects of artificial intelligence in ag
prospects of artificial intelligence in ag
 
Indian agriculture: Mechanization to Digitization
Indian agriculture: Mechanization to DigitizationIndian agriculture: Mechanization to Digitization
Indian agriculture: Mechanization to Digitization
 
precision farming.ppt
precision farming.pptprecision farming.ppt
precision farming.ppt
 
e-Agriculture
e-Agriculturee-Agriculture
e-Agriculture
 
Agriculture development with computer science and engg.ppt
Agriculture development with computer science and engg.pptAgriculture development with computer science and engg.ppt
Agriculture development with computer science and engg.ppt
 
Weed management using remote sensing
Weed management using remote sensingWeed management using remote sensing
Weed management using remote sensing
 

Similar to Application of Machine Learning in Agriculture

Machine Learning techniques used in AI.
Machine Learning  techniques used in AI.Machine Learning  techniques used in AI.
Machine Learning techniques used in AI.ArchanaT32
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programmingSoumya Mukherjee
 
An introduction to machine learning and statistics
An introduction to machine learning and statisticsAn introduction to machine learning and statistics
An introduction to machine learning and statisticsSpotle.ai
 
Unit 3 – AIML.pptx
Unit 3 – AIML.pptxUnit 3 – AIML.pptx
Unit 3 – AIML.pptxhiblooms
 
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using RANURAG SINGH
 
Intro to data science
Intro to data scienceIntro to data science
Intro to data scienceANURAG SINGH
 
Supervised learning
Supervised learningSupervised learning
Supervised learningJohnson Ubah
 
Exploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfExploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfAmmarAhmedSiddiqui2
 
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra
 
dimension reduction.ppt
dimension reduction.pptdimension reduction.ppt
dimension reduction.pptDeadpool120050
 
MLEARN 210 B Autumn 2018: Lecture 1
MLEARN 210 B Autumn 2018: Lecture 1MLEARN 210 B Autumn 2018: Lecture 1
MLEARN 210 B Autumn 2018: Lecture 1heinestien
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPiyush Srivastava
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfDatacademy.ai
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?Smarten Augmented Analytics
 
Working with the data for Machine Learning
Working with the data for Machine LearningWorking with the data for Machine Learning
Working with the data for Machine LearningMehwish690898
 

Similar to Application of Machine Learning in Agriculture (20)

Machine Learning techniques used in AI.
Machine Learning  techniques used in AI.Machine Learning  techniques used in AI.
Machine Learning techniques used in AI.
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
 
An introduction to machine learning and statistics
An introduction to machine learning and statisticsAn introduction to machine learning and statistics
An introduction to machine learning and statistics
 
Unit 3 – AIML.pptx
Unit 3 – AIML.pptxUnit 3 – AIML.pptx
Unit 3 – AIML.pptx
 
07 learning
07 learning07 learning
07 learning
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
 
Introduction To Data Science Using R
Introduction To Data Science Using RIntroduction To Data Science Using R
Introduction To Data Science Using R
 
Intro to data science
Intro to data scienceIntro to data science
Intro to data science
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
 
Exploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdfExploratory Data Analysis - Satyajit.pdf
Exploratory Data Analysis - Satyajit.pdf
 
CSL0777-L07.pptx
CSL0777-L07.pptxCSL0777-L07.pptx
CSL0777-L07.pptx
 
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 1 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 1 Semester 3 MSc IT Part 2 Mumbai University
 
Machine Learning.pptx
Machine Learning.pptxMachine Learning.pptx
Machine Learning.pptx
 
dimension reduction.ppt
dimension reduction.pptdimension reduction.ppt
dimension reduction.ppt
 
MLEARN 210 B Autumn 2018: Lecture 1
MLEARN 210 B Autumn 2018: Lecture 1MLEARN 210 B Autumn 2018: Lecture 1
MLEARN 210 B Autumn 2018: Lecture 1
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
 
Predict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an OrganizationPredict Backorder on a supply chain data for an Organization
Predict Backorder on a supply chain data for an Organization
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
 
Working with the data for Machine Learning
Working with the data for Machine LearningWorking with the data for Machine Learning
Working with the data for Machine Learning
 

Recently uploaded

Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 

Recently uploaded (20)

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 

Application of Machine Learning in Agriculture

  • 1.
  • 2. Master Seminar-I Application of Machine Learning in Agriculture Aman Vasisht PGS20AGR8404 Dept. of Agricultural Statistics UNIVERSITY OF AGRICULTURAL SCIENCES, DHARWAD COLLEGE OF AGRICULTURE, DHARWAD
  • 3. OUTLINE MACHINE LEARNING AND ITS APPLICATIONS TYPES OF MACHINE LEARNING ALGORITHMS CASE STUDIES CONCLUSION REFERENCES
  • 4. QUICK QUESTIONNAIRE How many of you have heard about Machine Learning ? How many of you know about Machine Learning ? How many of you are using Machine Learning ?
  • 5. What is Machine Learning ? • It is the science of programming computers so they can learn from data. • A type of AI that allows applications to become more accurate in predicting outcomes. Artificial Intelligence Machine Learning Deep Learning Data Science AI : Programs with the ability to learn and reason like humans ML : Algorithms with the ability to learn and make informer decisions DL : Artificial neural networks adapt and learn from vast amounts of data
  • 6. APPLICATIONS IN AGRICULTURE : • Yield Prediction : An accurate model can help farm owners to take informed management decisions for their farm. • Disease Detection : Use of algorithms can help to identify diseased plants with good accuracy. • Crop quality : Grading of commodities can be done using some parameters. • Livestock Management : Managing farms according to the controlled conditions and parameters.
  • 7. TERMINOLOGY : • Features : The number of features or distinct traits that can be used to describe a label in a quantitative manner. • Label or Target : The final outcome or variable which is dependent on the contribution of the features. • Training : To train the algorithm with dataset. • Testing : To check accuracy of predicted values. Let’s understand in a better way
  • 8. Training : Apple Features : 1. Color : Reddish 2. Type : Fruit 3. Shape etc.. Features : 1. Color : Greyish 2. Type : Company Logo 3. Shape etc.. Features : 1. Color : Yellowish 2. Type : Fruit 3. Shape etc..
  • 10. TYPES OF MACHINE LEARNING :  Supervised Learning: - We are able to predict future outcomes based on past data. It requires both features and labels to be given to the model for it to be trained.  Unsupervised Learning: - We are able to identify hidden patterns from the input data provided. By making the data more readable and organized, the patterns, similarities, or anomalies become more evident.
  • 11. SUPERVISED LEARNING : • Let feature variables be ‘X’ and output or label variable be ‘Y’. you use an algorithm to learn the mapping function from the input to the output. Y = f(X) The goal is to approximate the mapping function so well that when you have new input data (X) that you can predict the output variables (Y) for that data. Classification : A classification problem is when the output variable is a category, such as : Effective or non-effective Disease or no disease, etc.
  • 12. MAJOR ALGORITHMS Classification. • KNN • SVM • Logistic Regression • Decision Tree Classifier • Naive bayes Regression : A regression problem is when the output variable is a real value, such as “yield” or “weight”. The algorithms under this category are : • Linear Regression • Multiple Regression • Polynomial Regression • Lasso Regression • Ridge Regression
  • 13. UNSUPERVISED LEARNING : • Unsupervised learning is where you only have input data (X) and no corresponding output variables. • The goal for unsupervised learning is to model the underlying structure or distribution in the data in order to learn more about the data. • Clustering: A clustering problem is where you want to discover the inherent groupings in the data, such as grouping customers by purchasing behavior. Algorithms : DBSCAN K Means clustering Hierarchical clustering
  • 14. Input data These are known fruits Model It’s an apple Prediction Input data Model Unsupervised Learning Supervised Learning
  • 15. GOOD OR BAD MACHINE LEARNING MODEL : • The main goal of each machine learning model is to generalize well. • Here generalization defines the ability of an ML model to provide training on the dataset, which can produce reliable and accurate output. • Underfitting and overfitting are the two terms that need to be checked for the performance of the model and whether the model is generalizing well or not. Before understanding overfitting and underfitting, let's understand some basic terms : Bias Variance
  • 16. Bias-Variance Tradeoff : Y = 𝑓(X) + ϵ [Let Y be dependent variable and X be independent variable] ϵ∼N(0,σϵ). We may estimate a model 𝑓(X) of 𝑓(X) using regression, Therefore the error, Err(x)=E[(Y− 𝑓(X)) 2 ] This error may then be decomposed into bias and variance components: Err(X) = (E[𝑓(X)]− 𝑓(X)) 2 + E[(𝑓(X)−E[𝑓(X)]) 2 ] + σ 2 e Err(X) = Bias 2 + Variance + Irreducible Error Low Variance High Variance Low Bias High Bias
  • 17.
  • 18. SIMPLE REGRESSION : • Linear regression is one of the easiest and most popular Machine Learning algorithms that is used for predictive analysis. • y= a0+a1x+ ε y= Dependent variable (target variable) x= Independent variable (predictor variable) a0= Intercept of the line a1 = Linear regression coefficient We wish to find a0 and a1 such that 𝛴(𝑦𝑖 − (𝑎0 + 𝑎1𝑥))2 is minimum. a0 = 𝑦 - a1𝑥 and a1 = 𝛴(𝑥i − 𝑥)(𝑦i − 𝑦 ) 𝛴 𝑥𝑖 − 𝑥 2
  • 19. 0 2 4 6 8 10 12 14 16 18 20 0 5 10 15 20 25 Y X Scatter plot y = 0.7019x + 2.4094 R² = 0.8363 0 2 4 6 8 10 12 14 16 18 20 0 5 10 15 20 25 Y X Best fit line Imagine if we add a couple of large values in the data, will it affect the regression line? Let’s check it y = 1.5952x - 4.3564 R² = 0.4629 -10 0 10 20 30 40 50 60 70 0 5 10 15 20 25 Y X Best fit line Outlier
  • 20. Detection of outliers in Machine Learning model: • Using Z score : • Z score helps to understand if a data value is greater or smaller than mean and how many standard deviations away it is from the mean. • 𝑍 = 𝑥−𝑥 𝜎 • Values above and below 𝑥 ± 3𝜎 are considered outliers. Q. What is the most appropriate measure of central tendency when the data has outliers? The median is usually preferred in these situations because the value of the mean can be distorted by the outliers.
  • 21. • Inter-Quartile Range (IQR) proximity rule : The data points which fall below Q1 – 1.5 IQR or above Q3 + 1.5 IQR are outliers. Box plot diagram also termed as Whisker’s plot is a graphical method. The very purpose of this diagram is to identify outliers and discard it from the data series. Crop production Crop area
  • 22. ASSUMPTIONS OF REGRESSION ANALYSIS IN ML : • Linear and additive • No auto correlation • No multicollinearity • Homoscedasticity • Normal distribution of errors These assumptions are violated a lot and this violation if overlooked by a researcher, can make the model bad and not good for predictions.
  • 23. REGULARIZATION : • Regularization is an important concept that is used to avoid overfitting of the data, especially when the trained and test data are much varying. • Two types : L2 Ridge regression L1 Lasso regression L2 Ridge regression : • It performs ‘L2 regularization’, i.e. adds penalty equivalent to square of the magnitude of coefficients. Thus, it optimizes the following: Objective = RSS + 𝜆* (sum of square of coefficients)
  • 24. Loss Penalty • 𝜆 is the tuning parameter which balances the amount of emphasis given to minimizing RSS vs minimizing sum of square of coefficients. • In majority of cases, it is used to prevent overfitting. • It is mostly used to prevent multicollinearity. • It reduces the model complexity by coefficient shrinkage. L1 Lasso regression : LASSO stands for Least Absolute Shrinkage and Selection Operator.. • Lasso regression performs L1 regularization, i.e. it adds a factor of sum of absolute value of coefficients in the optimization objective.
  • 25. • Objective = RSS + 𝜆* (sum of absolute value of coefficients) It is generally used when we have more number of features, because it automatically does feature selection which makes it better from ridge regression. Constraint region RSS as it moves away from minimum
  • 26. CLASSIFICATION : • A common job of machine learning algorithms is to recognize objects and being able to separate them into categories. KNN (K-Nearest Neighbor) algorithm: • K-NN is a non-parametric algorithm. • It is also called a lazy learner algorithm. • KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies that data into a category based on some distance measures. • One of these measures is Minkowski distance. c : a parameter p,q are two points
  • 27. 𝑖=1 𝑛 𝑝𝑖 − 𝑞𝑖 2 Euclidean distance : When c = 2 𝑖=1 𝑛 |𝑝𝑖 − 𝑞𝑖| Manhattan distance : When c = 1 P Q 0 2 4 6 8 10 12 14 16 18 0 5 10 15 20 25 30 35 40 45 • K Number of Neighbors are generally taken as odd : 3, 5.. etc. • Very simple • Works with any number of classes • Re-scaling is very important as it is a distance-based algorithm. K = 5
  • 28. Accuracy : Predicted Value Actual Value n 0 1 0 TN FP 1 FN TP Let’s see an example TN : True Negative FP : False Positive FN : False Negative TP : True Positive Accuracy : Predicted Value Actual Value n = 150 Healthy unhealthy Healthy 40 10 Unhealthy 5 95 Accuracy = Correctly predicted / TN + FP + FN + TP Error rate = Wrong predicted / TN + FP + FN + TP Therefore, Accuracy = (40 + 95)/(40 + 10 + 5 + 95) = 0.9 or 90% Error rate = 15/150 = 0.1 But how do we know which number to take as K ? Is it 5, 7 or any other number?
  • 29. Support Vector Machine (SVM) : • In the SVM algorithm, we plot each data item as a point in n-dimensional space (where n is a number of features you have) with the value of each feature being the value of a particular coordinate. • The goal is to find decision boundary that is separating the classes. Two types : • Linear SVM : if a dataset can be classified into two classes by using a single straight line. • Non-linear SVM : a dataset cannot be classified by using a straight line -3 2 7 12 17 22 -3 2 7 12 17 22
  • 30. -3 2 7 12 17 22 -3 2 7 12 17 22 Maximum Margin Max. Margin Hyperplane Support vectors Terminology : Hyperplane : The best decision line or boundary. Support vectors : the closest point of the lines from both the classes. Margin : The distance between the vectors and the hyperplane. It should be maximum. Kernel : Kernel Function generally transforms the non-linear data into linear separable data.
  • 31. To transform the non-linear data : • Y = x2 (for 1D non-linear data) By adding this dimension, we will get two-dimensional space. • Z = x2 + y2 (for 2D non-linear data) By adding this dimension, we will get three-dimensional space. Radial Basis Function (RBF) : • It computes the similarity or how close points x1 and x2 are to each other. 𝑘(𝑥1, 𝑥2) = ⅇ𝑥𝑝 − | 𝑥1 − 𝑥2 |2 2𝜎2
  • 32. UNSUPERVISED MACHINE LEARNING : • Unsupervised learning is the training of a machine using information that is neither classified nor labeled. • It groups unsorted information according to similarities, patterns, and differences without any prior training of data. Hierarchical Clustering: It involves creating clusters in a predefined order when similar clusters are grouped together and are arranged in a hierarchical manner. Non Hierarchical Clustering : It involves formation of new clusters by merging or splitting the clusters. It does not follow a tree like structure like hierarchical clustering. K means clustering and DBSCAN are two effective algorithms.
  • 33. 3 clusters formed when the data is uniform i.e. when data is easily separable with naked eye. What if the data is non-uniform? Clusters
  • 34. DBSCAN (DENSITY-BASED SPATIAL CLUSTERING OF APPLICATIONS WITH NOISE) : • The DBSCAN algorithm has a key idea that for each point of a cluster, the neighborhood of a given radius has to contain at least a minimum number of points. • DBSCAN algorithm requires two parameters: Min_pts: The minimum number of points (a threshold) clustered together for a region to be considered dense. Eps (ε): A distance measure that will be used to locate the points in the neighborhood of any point. • In this algorithm, we have 3 types of data points. Core Point: A point if it has more than MinPts needed within eps. Border Point: A point which has fewer than MinPts within eps but it is in the neighborhood of a core point. Noise or outlier: A point which is not a core point or border point.
  • 35. Noise Min_pts : 4 Red : Core points Green : Border points that are still part of cluster because they are within epsilon of a core point, but do not meet the min_points criteria. Blue : Noise point, not assigned to cluster. Important points : • Other clusters are suitable only for compact and well separated clusters. In non- uniform data, DBSCAN is much better. • It is robust to outliers. • It takes count of dense regions and accordingly makes clusters and lower density points are not taken care of. • Minimum points we should take are 3. • DBSCAN uses Euclidean distance by default. 𝑖=1 𝑛 𝑝𝑖 − 𝑞𝑖 2
  • 36. IMAGE FEATURE EXTRACTION : Texture extraction • Number of different intensity levels in the image. This identifies the size of a GLCM. • Find intermediate matrix A by finding how frequently a pixel p occurs in a particular spatial relationship with pixel q. • Calculate GLCM by dividing each element of matrix A by the sum of elements of matrix A. Color extraction • Extract three components red, green and blue from image. • Convert color image to HSV image. • Extract hue, saturation and intensity of image. • For each component extracted, compute mean, variance and range. Grey Level Co-occurrence matrix (GLCM) is used to find:
  • 37. 4 4 1 2 0 2 1 1 3 0 1 2 0 1 2 C = 1 18 * C 0.22 0.22 0.06 0.11 0 0.11 0.06 0.06 0.16 0 1 2 0 1 2
  • 38. CASE STUDY-1 : CLASSIFICATION OF GRAPE LEAVES USING KNN AND SVM CLASSIFIERS Anil A. Bharate, M. S. Shirdhonkar (2020)
  • 39. DATA SOURCE: • This case study proposes a technique to classify the grape leaf as healthy and unhealthy. • Database consisted of 90 images of grape leaves. Training : 30 images of healthy and 30 images of unhealthy leaves. Testing : 30 images including healthy and unhealthy leaves. • Feature extraction (Image processing) : Texture and color features are extracted using Grey Level Co-occurrence Matrix (GLCM). a) Healthy leaf b) Unhealthy leaf
  • 40. RESULTS : Parameter : Proposed method (SVM) Proposed method (KNN) Features 4 texture & 18 color 4 texture & 18 color Classifier SVM kNN Number of samples 30 30 90 96.66 SVM KNN Accuracy (%) Comparison of Results It is noticed that accuracy of KNN is better than SVM model. This is because KNN computes distance to all neighbors from a point, then finds nearest neighbor and then decides about the class. On the other hand, SVM considers only support vectors to find hyper plane and then decides about the class.
  • 41. CONCLUSIONS OF CASE STUDY : • Automation will be a boon for farmers to prevent their plants from diseases and increase the yield. • The KNN classifier gives better accuracy than SVM classifier. • As a future work system can be trained to identify the diseases present on the grape leaves and also provide the possible solution. • Automatic image capturing camera can be installed with the help of government bodies and thus the images captured can be sent for feature selection and then tested and trained with some algorithms, concluding best algorithm with best accuracy for future identification of scalability of infected leaves.
  • 42. CASE STUDY-2 : CROPAND FERTILIZER RECOMMENDATION SYSTEM BASED ON SOIL CLASSIFICATION Akshatha et al. (2022)
  • 43. DATA SOURCE : • The case study mainly focuses on classifying the soil records gathered from GKVK UAS, Bangalore, Karnataka. • It includes samples from various taluk of Chikkamagaluru district like Tarikere, Kadur, Sringeri and Koppa. Soil samples : 1550 (Training – 70%, Testing – 30%) Attributes : N, P, K, Ca, Mg, Lime, C, S and moisture. Algorithm used : SVM, KNN Classification of soil nutrition into 4 classes Crops suggested Class 0 (low fertile) Beans, green peas, carrot, onion Class 1 (moderately fertile) Radish, cowpea, cabbage, cauliflower Class 2 (high fertile) Sugarcane, paddy, bajra, guava Class 3 (very high fertile) Barley, cotton, tobacco, sunflower
  • 44. Results : Ca Mg K S N Lime C P Moistur e Class 9.653 6.585 142 108 226.05 5.83 1.29 18 0.9 1 19.88 22.2 339.35 77 308.25 6.45 2 298 0.8 2 2.931 41.22 514.29 108 277.42 6.43 0.74 48 0.6 1 True class Predicted class Confusion matrix for SVM model Correctly classified Incorrectly classified Total testing data Accuracy of SVM 845 240 1085 77.85% Class-0 labels (1st row) : 58% predicted same 27% misclassified as class-1 1.1% misclassified as class-2 14% misclassified as class-3
  • 45. CONCLUSIONS OF CASE STUDY : • KNN algorithm was also used which gave less accuracy of 72.04%. • SVM algorithm obtained higher accuracy as it captured non-linearity in data. • Based on the classification of soil class, crops can be recommended. • This can help farmers to grow the best-suited crop that is adaptable to their soil condition. • The model can be improved with more hyper parametric tuning which can help increase accuracy of the model and ultimately help farmers get to know about their farm soil fertility level and crop suggested based on the fertility levels.
  • 46. CASE STUDY-3 : Sugar Cane Crop Yield Estimation Using K-Nearest Neighbors Kumar et al.
  • 47. • The dataset includes predictors : Rainfall, pH, Organic Carbon, Area, S, Cu, Fe, P, Mn, N, Fibre. • Dependent variable : Yield (tons) • Crop considered : Sugarcane • State : Telangana • Period : 1901 to 2016 annual data. • Data re-scaled before analysis DATA SOURCE :
  • 49. • Accurate yield predictions across different areas can help the farmers get better profit from the crops. • KNN can be an alternative approach for regression as usually it is used mostly for classification problems. • In future we can make predictions using different algorithms and compare the accuracies to chose best among them. CONCLUSIONS OF CASE STUDY :
  • 50. CASE STUDY-4 : MULTIVARIATE WEATHER ANOMALY DETECTION USING DBSCAN CLUSTERING ALGORITHM Wibisono et al.
  • 51. • Dataset : 8 attributes used from daily weather data. • Place : Semarang city, Indonesia. • Algorithm used : DBSCAN & PCA DATA SOURCE : Attribute Data type Min. temperature Numerical Max. temperature Numerical Average temperature Numerical Average Humidity (%) Numerical Sun exposure time (hours) Numerical Maximum wind speed (m/s) Numerical Average wind speed (m/s) Numerical Rainfall (mm) Numerical
  • 52. RESULTS : 0.19 eps PC1 mainly consisted of : Avg. temperature, Max. temperature and Avg. humidity. PC2 mainly consisted of Tn temperature.
  • 53. • The result showed that anomalous weather is characterized by high humidity and low temperature. • The experimental result had demonstrated that DBSCAN is capable of identifying peculiar data points that are deviating from the ‘normal’ data distribution. • The anomalous weather was characterized by high humidity and low temperature. • PCA can be utilized with DBSCAN in detection of noise. CONCLUSIONS :
  • 54. CONCLUSIONS OF MACHINE LEARNING : • No algorithm is appropriate for all situations. • Choosing a technique depends on pattern, type of data and experience of the analyst. • Using ML algorithms as a pipeline can save time of the analyst and give fast solutions to the farmer. • There is a wide scope of application of ML in agriculture, especially in plant disease classification, soil or crop classification and prediction of yield of crops. • Automation can help reduce biotic and abiotic stress in fields that is prevailing in the country.
  • 55. REFERENCES : • Akshatha, G.C. and Shastry, K.A., 2022. Crop and fertilizer recommendation system based on soil classification, Recent Advances in Artificial Intelligence and Data Engineering (pp. 29-40). • Bharate, A.A. and Shirdhonkar, M.S., 2020. Classification of grape leaves using KNN and SVM classifiers, 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC) (pp. 745-749). • Naveen N. Kumar, Balakrishnan, M., 2018. Sugar cane crop yield estimation using K- Nearest Neighbors, Journal of Advance Research in Dynamical and Control Systems, 10(4), (pp. 199-207). • Wibisono, S., Anwar, M.T., Supriyanto, A. and Amin, I.H.A., 2021. Multivariate weather anomaly detection using DBSCAN clustering algorithm, Journal of Physics: Conference Series (Vol. 1869, No. 1, p. 012077).