presentasiiconnic.pptx

KNN Algorithm to Determine
Optimum Agricultural Commodities in
Smart Farming
1st Ahmad Cucus
Faculty of Computing
Universiti Malaysia Pahang
Pekan, Malaysia
ahmad.cucus@ubl.ac.id
4th Yoga Pristyanto
Faculty of Computer Science
Universitas Amikom Yogyakarta
Sleman, Indonesia
yoga.pristyanto@amikom.ac.id
2nd Al-Fahim Mubarak Ali
Pekan, Malaysia
fahim@ump.edu.my
5th Ferian Fauzi Abdulloh
Sleman, Indonesia
ferian@amikom.ac.id
3rd Afrig Aminuddin
Sleman, Indonesia
afrig@amikom.ac.id
6th Zafril Rizal M. Azmi
Pekan, Malaysia
zafril@ump.edu.my

Research Background
• The process of identifying the most suitable crop for a farmer's
land might present a formidable challenge, yet it holds significant
importance in optimizing crop productivity and agricultural
efficiency.
• Smart farming is a concept of agricultural management using
modern technology to increase the quantity and quality of
agricultural products, a new trend in agricultural technology.
• The utilization of and technology can greatly aid in determining
the suitable type algorithmic approaches e of vegetation for a
given land area.

Research Questions
• How can the appropriate plant type be determined using an
algorithm and machine learning approach?
• How to use the string type to determine the Euclidean value of two
variables ?
• how to use the supplied dataset to calculate plant selection
decisions ?

Research Purpose
• An technique for categorizing data based on learning data
obtained from the parameter values possessed by the closest data
is known as the Key Nearest Neighbor technique (KNN Algorithm).
In this study, the KNN Algorithm is used to solve the issue. This
method seeks to determine the distance between the variables in
the dataset and the case variables.
• The main contributions in this research are we try to apply string
data type matching techniques, which then we will calculate the
Euclidean value between the dataset variable and the case data
variable.

Methodology
• Through the process of data
collecting, the K-Nearest
Neighbors (KNN) approach can
be used to determine whether
particular plants are suitable for
a given set of land conditions.
The KNN algorithm requires only k parameters, labeled training
samples, and a metric to determine distances in n-dimensional space. The
following are the steps in the KNN classification process [20]:
1. Determine the parameter k (number of nearest neighbors).
2. Calculate the distance between each test sample and all training
samples (Euclidean distance).
3. Order the distance from the smallest and determine the k nearest
neighbors.
4. Determine the class for each nearest neighbor. Select the majority
class of the nearest neighbors as the predicted value of the test
sample classification.

Dataset
We used datasets from Kaggle as the basis for model
testing. Kaggle provides a cloud-based data science
platform and enormous datasets. Several fields are
included in the dataset, including N, P, K,
temperature, humidity, pH, and rainfall. Besides that,
we added a variable with a string data type. This
variable is the type of pest found on that land
N P K Temp Hum pH RF Pest Lbl
90 42 43 20.88 82 6.5 203 aphids, armyworms, beetles,
bollworms, grasshoppers
rice
85 58 41 21.77 80.3 7.04 227 bollworms, grasshoppers,
mites, mosquitoes
rice
60 55 44 23 82.3 7.84 264 grasshoppers, mites,
mosquitoes, sawflies,
stemborers
rice
74 35 40 26.49 80.2 6.98 243 sawflies, stemborers,
leafhoppers, earworms
rice
78 42 42 20.13 81.6 7.63 263 bollworms, grasshoppers,
mites, mosquitos, sawflies,
stemborers
rice
The dataset consists of 2200 rows with 22 distinct
labels (plant). Each label consists of 100 data.

Apporach
The KNN algorithm usually uses the Euclidean
distance in test and training data calculations.
The Euclidean distance is described as follows
Basic Algorithm
𝑑𝑖𝑠𝑡 𝑥, 𝑦 =
𝑖=1
𝑛
𝑥𝑖 − 𝑦𝑖 2
where, dist (x,y) represents the scalar
distance of data vectors x and y, and i
denotes the i-th of the data.
Proposed Algorithm
The novelty in this study is our proposed
model for matching string data type variables
in the KNN algorithm. In the dataset,
sometimes the string-valued data type has
more than one value, and a comma separates
each value in the variable (,). to calculate the
Euclidean value as in
𝑑𝑖𝑠𝑡 𝑥, 𝑦 = 𝑥 ∩ 𝑦 − 𝑦
2
where, Σy represents the number value of case data
variables, and Σ(x∩y) is the sum of the values of
the case data variables that intersect with the values
of the dataset variables.

Algorithm
Algorithm 1. KNN for Plant Determination
1 BEGIN
2 SET PlantD [cr1,cr2,..criN];
3 SET PlantNC[cr1,cri2,..criN];
4 WHILE item NOT Last data DO
5 SET dist= 0;
6 SET euclidean= 0;
7 SET j = 0;
8 LenPlantDs = PlantDataset.length;
9 LOOP i from 0 to LenPlanDs
10 dist[i] = (PlanD.cr1[i] – PlanNC.cr1[i])^2
11 + (PlanD.cr2[i] – PlanNC.cr2[i])^2
12 + (PlanD.cr2[n] – PlanNC.cr2[n])^2
13 enclidean[i]=sqrt(dist[i]
14 END LOOP
15 IF euclidean[i] < euclidean[i-1]
16 Swap(euclidean[j], euclidean[j-1])
17 END IF
18 j = j + 1;
19 END WHILE
20 RETURN euclidean[]
21 END

Implementation
Parameter New Case Dataset
N 89 90
P 49 42
K 32 43
Temperature 26.9 20.9
Humidity 82.9 82.0
pH 7.9 6.5
Rainfall 302.3 202.9
Pest armyworm,
beetle,
grasshopper
aphids, armyworms,
beetle, bollworm,
grasshopper,
Label rice
The next step is to find the distance using the
KNN algorithm
𝑑𝑖𝑠𝑡 𝑥, 𝑦 =
𝑖=1
𝑛
𝑥𝑖 − 𝑦𝑖 2

Result
Parameter
New
Case
D1 D2 D3 D4
N 89 93 82 93 79
P 49 47 40 53 42
K 32 37 40 38 37
Temp 26.9 21.5 23.8 26.9 24.9
Humidity 82.9 82.1 84.8 81.9 82.8
pH 7.9 6.5 6.3 7.1 6.6
Rainfall 302.3 295.9 298.6 290.7 295.6
Pest bees,
beetle,
grasshopp
er
aphids,
armywor
ms,
beetles,
bollworm
s,
grasshopp
ers
bollworm
s,
grasshopp
ers, mites,
mosquito
es
grasshopp
ers, bees,
mosquito
es,
sawflies,
stemborer
s
sawflies,
stembor
ers,
beetle,
grasshop
per
Normalized
Pest
1 2 1 1
Label rice rice rice rice
Euclidean 25.9 36.4 28.4 33.1
Rank 1 2 3 4
The table also presents the ranking of each
Euclidean value to determine the label of new
sample data. The Euclidean distance between
each pair of new cases and the dataset and the
smallest value is selected in the group.

Discussion
The comparison of Euclidean values without
pest variable is shown in Figure, It can be seen
that the commodity that is closest to the data
sample variable with the smallest value in the
correct order are rice, jute, and papaya.
After comparing the values with variables of type number, we will
see the results of matching variables of type string. That variable is
a pest. It is because the gap for each data from the pest variable is
very close. It is because the values in the pest variable have a
minimum value of 0 and a maximum value of 3. The value 0 is
assumed to have a significant similarity because all the pests in the
sample case data are in the pests in the dataset.

Conclussion
The process of matching variables of type string data is usually presented with a percentage of
similarity between the data. The results of calculations with percentage values are unsuitable
for the weighting process. Therefore, the researchers proposed models and equations in the
KNN algorithm so that they can be used in finding Euclidean values between String variables.
The data visualization results conclude that the rice, jute, and papaya classes are closest to the
examples of environmental cases. The use of KNN with the addition of a data match search
process based on variables of type string data in this study has been considered successful in
providing information for decision support in owning the commodity to be planted. This study
also obtained values that are closer to the case examples. However, comparing other algorithms
in this example is necessary to get more varied values. Besides, the parameters used can be
more developed, making the results more precise.

presentasiiconnic.pptx

Recommended

Recommended

More Related Content

Similar to presentasiiconnic.pptx

Similar to presentasiiconnic.pptx (20)

Recently uploaded

Recently uploaded (20)

presentasiiconnic.pptx