Crop Recommendation System Using Neural
Networks with Soil and Weather Data for
Optimized Agricultural Decision-Making
IN13/00079/21 YUSUF KANTAI
IN13/00017/21 REBECCA WAMBUA
IN13/00040/21 TERRY LECHUTA
INTRODUCTION
• Agriculture plays a crucial role in ensuring food security and supporting the global economy.
However, farmers often face challenges in deciding which crops to grow, especially with
varying soil conditions, weather patterns, and climate change impacts. (Stekhoven &
Buhlmann, 2012). To address this challenge, we developed a Crop Recommendation System
using Neural Networks leveraging soil and rainfall data for more effective decision-making.
• Integrating artificial intelligence into agriculture promotes sustainable farming by improving
productivity, minimizing resource wastage, and boosting overall crop yield (Stekhoven &
Buhlmann, 2018). This aligns with recent studies emphasizing the significance of data-driven
models in solving agricultural challenges (Monteiro Thomas, 2022; Morales & Villalobos,
2023).
• This system aims to assist farmers in selecting the most suitable crops for their specific
conditions by analyzing data such as soil type, pH level, moisture, and local weather patterns
like temperature and rainfall. Choosing the appropriate crop in relation to location-specific
soil factors and climatic conditions is also vital for enhancing production (Wang, Shi, & Wen,
2023).
• Therefore, farmers must be equipped with instruments that allow them to choose the best
crop for the region’s unique meteorological and soil conditions (Uddin, Matin, & Meyer,
2019). In developing nations, using machine learning for agricultural planning objectives has
resulted in the development of applications such as crop recommendation, crop disease
diagnosis, fertilizer management, and so on (Gow, J. (2019).
• The farmers would profit from the development of the crop recommendation system considering location-specific factors.
The research described in this article tries to create a recommendation algorithm that offers the greatest produce based on
terrain and climate factors unique to a particular region (Sharma et al., 2020). In this paper, a Random Forest model was
utilized for the recommendation on crop systems depending on terrain and environmental factors (Shehadeh et al., 2021).
• Tarek Z et al (2023) Soil erosion status prediction using a novel random forest model optimized by random search
method. This approach improves the accuracy of the model by fine-tuning its parameters, making it more effective at
analyzing complex factors contributing to soil erosion. The optimized model provides a reliable tool for assessing soil
erosion risk, which is crucial for sustainable land management and conservation efforts.The work by Bhadouria R, et al.
(2019) examines the impact of climate change on agricultural ecosystems, focusing on the challenges and consequences
that arise in this new era. It discusses how changes in temperature, precipitation patterns, and extreme weather events
affect crop yields, soil health, and water availability, posing significant risks to agricultural productivity.
• Tailor Brown. (2019) developed an Integrated Climatic Assessment Indicator (ICAI) specifically for assessing wheat
production. The ICAI combines various climate-related factors, such as temperature, rainfall, and humidity, into a single
comprehensive metric to evaluate their impact on wheat growth and yields.Paudel Jones. (2019) utilized machine learning
alongside agronomic principles from traditional crop modeling to create a reliable baseline for predicting crop yields on a
large scale. By integrating data-driven machine learning methods with established agricultural knowledge, they improved
the accuracy of yield predictions, considering factors like soil properties, weather conditions, and crop management
practices.
CONTRIBUTION
• This paper makes several contributions to the field of agricultural technology by building on prior research in crop
prediction, machine learning, and sustainable agriculture.
• Specifically, it advances the work of Wang et al. (2023) by addressing location-specific crop recommendation
systems, focusing on optimizing agricultural yields based on environmental factors such as soil properties, moisture,
and weather patterns. The study also complements the research by Uddin, Matin, and Meyer (2019) by emphasizing
the development of climate-based decision-making tools for regions facing changing weather dynamics.
• Additionally, this paper aligns with the findings of Zhang et al. (2018) and Monteiro et al. (2022), who demonstrated
the potential of AI and machine learning for crop forecasting and agricultural management. By integrating graph
convolutional neural networks (GCNNs), the proposed system offers a novel approach to modeling spatial and
environmental relationships, enhancing the predictive accuracy for crop selection. This extends the work of Sharma
et al. (2020) by addressing regional variations in soil and climate more effectively.
• Furthermore, the system builds on ensemble learning methodologies, as highlighted by Sagi and Rokach (2018),
while also refining terrain-based crop recommendations, following Shehadeh et al. (2021). By considering region-
specific agricultural constraints and potentials, this research offers practical tools for sustainable farming practices,
advancing the discussion initiated by Letey (2017) on the interplay between soil properties and crop productivity.
RELATED WORK
• Various applications of ML models in agriculture have been, such as crop yield prediction, weather forecasting, smart irrigation system,
crop disease prediction, and deciding minimum support price (Young, L. J., 2016; Nandy and Singh, 2020; Sharma et al., 2020; Cravero
and Sepulveda, 2021). Moreover, in order to achieve accurate predictions, researchers used the supervised ML algorithms for crop
production prediction in (Kaur, 2016; Shehadeh et al., 2021). In addition, many researchers proposed a methodology that uses Average
Pearson Correlation (APC) and Coefficient of Variance (CV) to determine indications that reveal crop price fluctuation (Pereira et al.,
2021). All these methods require the dataset to be extremely clearly described, which is difficult to generate in the context of Bangladesh.
•
• Van et al. (2020) conducted a comprehensive review, highlighting that soil composition, temperature, and rainfall are key features often
used, with artificial neural networks (ANNs) being a popular algorithm in such models. Rashid et al. (2021) explored multiple machine
learning (ML) algorithms, with a focus on predicting agricultural yields, particularly for palm oil. Kalimuthu et al. (2020) utilized the
Naive Bayes algorithm, while Sharma et al. (2021) provided an extensive review of ML applications in agriculture, particularly in areas
like livestock productivity through machine learning and computer vision for behavioral predictions.
• Cunha et al. (2018) developed a pre-season forecast model for soybean and maize, excluding NDVI data, integrating soil parameters from
satellite data, climate forecasts, and rainfall information. Pande et al. (2021) built a practical ML-based system for crop yield prediction
and fertilizer recommendations to boost yields. Reddy and Kumar (2021) proposed an ML-based approach to identify profitable crops and
forecast yields using algorithms like SVM, ANN, RF, multivariate regression, and k-NN. Tahaseen and Moparthi (2021) demonstrated
how various ML techniques can predict crop yields based on factors such as weather and temperature, with dataset availability influencing
feature selection.
• Sharma et al. (2021) examined methods for weed and pest detection, crop prediction, and leaf disease diagnosis, discussing the state of global
agricultural yield forecasting. Ray et al. (2022) used distribution and correlation analysis to propose a model for 22 crop types, achieving an
accuracy of 99.54%. Vashisht et al. (2022) applied extreme learning machines to predict rice yield based on geographical and seasonal factors.
Gupta et al. (2022) emphasized the potential of ML to segment large datasets for yield prediction.
• Seireg et al. (2022) utilized cascading and stacking regression to predict blueberry yield with high accuracy, while Rasheed et al. (2021) tested a
decision-making tool on historical agricultural data in Pakistan to predict net profits. Pant et al. (2021) defined the use of ML techniques to
identify trends in data for crop prediction. Chandraprabha et al. (2021) utilized predictive analytics for soil nutrient forecasting, while Raja et al.
(2022) demonstrated that ensemble techniques can enhance yield predictions over traditional classification methods.
• Cedric et al. (2022) presented a decision tree and k-NN-based ML model for forecasting crop yields in West Africa. Ali et al. (2022) employed
remote sensing and statistical models to evaluate crop production. Pantazi et al. (2016) predicted wheat yield using unsupervised learning with
satellite and soil data. Aghighi et al. (2018) predicted maize yield using time-series imagery from Landsat 8, introducing a modified feature
selection method that outperformed others with 95% accuracy (Mariammal et al., 2021).
• Kumar et al. (2021) incorporated pre-processing, exploratory data analysis (EDA), and detection modules for plant disease prediction,
achieving over 98% accuracy. Ziliani et al. (2022) combined the APSIM crop model with CubeSat images to produce high-resolution yield
maps. Vlachopoulos et al. (2022) determined that random forests were the best for green area index (GAI) prediction with an RMSE of
10.86%.
• Goel and Mishra (2022) achieved 95.64% accuracy using deep learning for phenological data, while Elavarasan and Vincent (2020) found that
Q-learning networks offered superior yield predictions. Haque et al. (2020) applied the ANN method to examine the impact of different factors
on crop yield, using error rates to evaluate performance. Cunha and Silva (2020) developed a model that used weather forecasts and crop
calendars to predict yields. Bose et al. (2016) employed spiking neural networks to analyze remote sensing data for crop yield prediction,
achieving an accuracy of 95.64%.
• Saeed and Lizhi (2019) developed a deep neural network (DNN) approach to enhance prediction accuracy, while Sun et al. (2020)
integrated RNN and CNN for extracting spatial and temporal features from time-series data. Qiao et al. (2021) introduced a deep learning
architecture combining RNNs and 3D CNNs for crop yield forecasting from multispectral images. Kalaiarasi and Anbarasi (2022)
introduced a multiple kernel DNN to enhance learning capacity for medium-scale agricultural datasets.
• Abbaszadeh et al. (2022) combined deep learning networks like 3DCNN and ConvLSTM to predict soybean yield, with probabilistic
outputs. Pang et al. (2020) used CNNs and hyperspectral imaging to model spectral data, comparing PCA and multidimensional scattering
correction. Alebele et al. (2021) applied Gaussian kernel regression to rice yield prediction, outperforming other Bayesian methods.
• Martinez et al. (2021) utilized Gaussian processes to identify climate extremes affecting crop productivity, while Qiao et al. (2021)
developed a 3D convolutional neural multi-kernel network to capture hierarchical features for yield prediction. Sivanantham et al. (2022)
improved accuracy by using orthogonal basis functions and quantile regression.
• Li et al. (2022) focused on combining solar-induced fluorescence (SIF), satellite, and environmental data for crop yield prediction. Gupta
et al. (2021) applied MapReduce architecture and K-means clustering for crop prediction based on soil and weather data. Liu et al. (2022)
used MLR to predict plant diseases with 91% accuracy, while Udutalapally et al. (2021) trained CNNs to achieve 99.24% accuracy in
disease prediction. Makkithaya and G. (2022) used deep residual networks for soybean prediction, while Mehta et al. (2021) compared
CNN and LSTM models for crop yield forecasting.
• Mopidevi et al. (2022) employed deep learning to predict Ficus stem growth, while Swarnakantha et al. (2022) evaluated comparative
studies on crop development. Bhansali et al. (2022) built a recommendation model using N-P-K and rainfall data to diagnose diseases and
provide treatment suggestions, while Nancy et al. (2022) developed an image-based plant disease detection system using machine learning
and deep learning.
METHODOLOGY
Flow of the Proposed System
Dataset
The dataset consists of parameters like Nitrogen(N), Phosphorous(P),
Potassium(K), PH value of soil, Humidity, Temperature and Rainfall. The
datasets have been obtained from the Kaggle website.
Features Description
N Nitrogen content in the soil (kg/ha)
P Phosphorus content in the soil (kg/ha)
K Potassium content in the soil (kg/ha)
Temperature Average temperature (°C)
Humidity Average humidity (%)
pH pH level of the soil
Rainfall Average rainfall (mm)
Label Categorical variable indicating the
recommended crop
Data preprocessing
• Before building the model, the following preprocessing steps were applied to
the dataset:
• Handling Missing Data
• The dataset was examined for any missing values using the
crop.isnull().sum() function. This check revealed that there were no missing
data points, ensuring the dataset was complete.
• The data types of each feature were inspected using crop.info() to confirm
that the numerical and categorical data were appropriately categorized,
preventing any issues during model training.
Handling of datasets
Normalization and Standardization
• The dataset underwent normalization and standardization. First, the features were
scaled using MinMaxScaler(), which transformed all feature values into a range
between 0 and 1. This step is important because it ensures that features with larger
ranges do not dominate the training process. After that, StandardScaler() was applied
to further standardize the data. This transformation shifts the data so that it has a
mean of 0 and a standard deviation of 1. This standardization helps to ensure
uniformity across the dataset, which is particularly beneficial when using algorithms
that assume data is normally distributed or when features have varying units or
scales. This preprocessing ultimately improves the model's performance and
convergence speed.
Feature Correlation
A correlation matrix was generated to understand the relationships between features. The
matrix showed that nitrogen (N) and phosphorus (P) had a weak negative correlation (-
0.23), while phosphorus (P) and potassium (K) exhibited a strong positive correlation
(0.74), indicating that these two elements often vary together in the dataset.
Feature Selection
• This step is focused on identifying and using the most relevant attribute from
the dataset. Through this process irrelevant and redundant information is
removed for the application of classifiers.In this proposed system applied
different Machine Learning algorithms like Decision Tree, Naïve Bayse (NB),
Support Vector Machine (SVM), Logistic Regression (LR), Random Forest
(RF) and XGBoost.
Random Forest
• Random forest first builds new datasets from the original data. Then, the model randomly selects rows from the original data to
build new datasets. The decision tree is trained on each of the bootstrapped data sets independently. The model randomly selects
a subset of features for each tree and uses only them for training. Since this is a classification problem, the prediction is made by
taking the majority voting of all the decision trees. This classification can also be expressed mathematically as shown:
• (x)-Y)
•
• Where:
• H(x) is the final predicted class for input x.
• h(x) is the prediction of ith
decision tree.
• Y represents the possible classes.
• || is an indicator function that equals 1 if the condition is true and 0 otherwise.
• N is the total number of decision trees in the forest.
Evaluation metrics
• Coefficients of determination (R2) are used as evaluation metrics for measuring the accuracy of all the models.
Adjusted R2 is a statistical measure that examines how changes in one variable can be explained by a change in a
second variable while predicting the outcome of an event.
• The formula for R2
can be expressed as:
•
•
• Where:
• R2
is the coefficient of determination.
• RSS is the Residual Sum of Squares which represents the total squared difference between the actual and predicted
values.
• TSS is the Total Sum of Squares which is the total squared difference between the actual values and their mean.
Conclusion
• Using datasets, machine learning models can reasonably accurately predict whether a crop will be profitable or not.
This study used four different machine learning algorithms to recommend crops according to the weather conditions
and soil nutrients. Random forest outperformed rest of the algorithms in this study with a testing accuracy R2 of
about 99%. Through this work, farmers will increase the productivity of their agriculture and prevent soil degradation
on cultivated land.
• They will also reduce the use of chemicals in crop production and make better use of water resources. Further
research can be conducted by considering more varieties of crops in future. The current research focuses on twenty-
two crops due to the limited availability of data. In future studies, soil fertility data could be assessed by considering
more granular geographical conditions, based on micro nutrients data like sulfur, zinc, iron, manganese, etc. Also, a
machine learning framework can be built which could recommend optimum amounts of pesticides and fertilizers to
be used for a particular crop. By doing so, the production of quality crops and the profits of farmers can be increased.
References
1. Stekhoven, D. J., and Buhlmann, P. (2012). MissForest—nonparametric missing value imputation for mixed-type data.
2. Sujjaviriyasup, T., and Pitiruek, K. (2013). Agricultural product forecasting using a machine learning approach.
3. Tavares, O. C. H., Santos, L. A., Filho, D. F., Ferreira, L. M., Garcia, A. C., Castro, T. A. V. T., et al. (2021). Response surface modeling of humic
acid stimulation of the rice(Oryza sativa L.) root system. Arch. Agron.
4. Uddin, K., Matin, M. A., and Meyer, F. J. (2019). Operational flood mapping using multitemporal Sentinel-1 SAR images: A case study from
Bangladesh. Remote Sensing. Van Ittersum, M. K., Cassman, K. G., Grassini, P., Wolf, J., Tittonell, P., and Hochman, Z. (2013). Yield gap
analysis with local to global relevance—a review.
5. Van Klompenburg, T., Kassahun, A., and Catal, C. (2020). Crop yield prediction using machine learning.
6. Sharma, R., Kamble, S. S., Gunasekaran, A., Kumar, V., and Kumar, A. (2020). A systematic literature review on machine learning
applications for sustainable agriculture supply chain performance.
7. Shehadeh, A., Alshboul, O., Al Mamlook, R. E., and Hamedat, O. (2021). Machine learning models for predicting the residual value of
heavy construction equipment: An evaluation of modified decision tree, LightGBM, and XGBoost regression. Automation Construction.
8. Siddique, M. N. E. A., de Bruyn, L. A. L., Osanai, Y., and Guppy, C. N. (2022). Typology of rice-based cropping systems for improved soil
carbon management.
9. P., Altman, D. G., and Sauerbrei, W. (2016). Dichotomizing continuous predictors in multiple regression: a bad idea.
10. Sagi, O., and Rokach, L. (2018). Ensemble learning: A survey. Wiley Interdiscip. Reviews: Data Min. Knowledge Discovery.
11. Sarker, M. A. R., Alam, K., and Gow, J. (2019). Performance of rain-fed Aman rice yield in Bangladesh in the presence of climate change.
Renewable Agric. Food systems.

Crop recommendation system powerpoint.pptx

  • 1.
    Crop Recommendation SystemUsing Neural Networks with Soil and Weather Data for Optimized Agricultural Decision-Making IN13/00079/21 YUSUF KANTAI IN13/00017/21 REBECCA WAMBUA IN13/00040/21 TERRY LECHUTA
  • 2.
    INTRODUCTION • Agriculture playsa crucial role in ensuring food security and supporting the global economy. However, farmers often face challenges in deciding which crops to grow, especially with varying soil conditions, weather patterns, and climate change impacts. (Stekhoven & Buhlmann, 2012). To address this challenge, we developed a Crop Recommendation System using Neural Networks leveraging soil and rainfall data for more effective decision-making. • Integrating artificial intelligence into agriculture promotes sustainable farming by improving productivity, minimizing resource wastage, and boosting overall crop yield (Stekhoven & Buhlmann, 2018). This aligns with recent studies emphasizing the significance of data-driven models in solving agricultural challenges (Monteiro Thomas, 2022; Morales & Villalobos, 2023).
  • 3.
    • This systemaims to assist farmers in selecting the most suitable crops for their specific conditions by analyzing data such as soil type, pH level, moisture, and local weather patterns like temperature and rainfall. Choosing the appropriate crop in relation to location-specific soil factors and climatic conditions is also vital for enhancing production (Wang, Shi, & Wen, 2023). • Therefore, farmers must be equipped with instruments that allow them to choose the best crop for the region’s unique meteorological and soil conditions (Uddin, Matin, & Meyer, 2019). In developing nations, using machine learning for agricultural planning objectives has resulted in the development of applications such as crop recommendation, crop disease diagnosis, fertilizer management, and so on (Gow, J. (2019).
  • 4.
    • The farmerswould profit from the development of the crop recommendation system considering location-specific factors. The research described in this article tries to create a recommendation algorithm that offers the greatest produce based on terrain and climate factors unique to a particular region (Sharma et al., 2020). In this paper, a Random Forest model was utilized for the recommendation on crop systems depending on terrain and environmental factors (Shehadeh et al., 2021). • Tarek Z et al (2023) Soil erosion status prediction using a novel random forest model optimized by random search method. This approach improves the accuracy of the model by fine-tuning its parameters, making it more effective at analyzing complex factors contributing to soil erosion. The optimized model provides a reliable tool for assessing soil erosion risk, which is crucial for sustainable land management and conservation efforts.The work by Bhadouria R, et al. (2019) examines the impact of climate change on agricultural ecosystems, focusing on the challenges and consequences that arise in this new era. It discusses how changes in temperature, precipitation patterns, and extreme weather events affect crop yields, soil health, and water availability, posing significant risks to agricultural productivity. • Tailor Brown. (2019) developed an Integrated Climatic Assessment Indicator (ICAI) specifically for assessing wheat production. The ICAI combines various climate-related factors, such as temperature, rainfall, and humidity, into a single comprehensive metric to evaluate their impact on wheat growth and yields.Paudel Jones. (2019) utilized machine learning alongside agronomic principles from traditional crop modeling to create a reliable baseline for predicting crop yields on a large scale. By integrating data-driven machine learning methods with established agricultural knowledge, they improved the accuracy of yield predictions, considering factors like soil properties, weather conditions, and crop management practices.
  • 5.
    CONTRIBUTION • This papermakes several contributions to the field of agricultural technology by building on prior research in crop prediction, machine learning, and sustainable agriculture. • Specifically, it advances the work of Wang et al. (2023) by addressing location-specific crop recommendation systems, focusing on optimizing agricultural yields based on environmental factors such as soil properties, moisture, and weather patterns. The study also complements the research by Uddin, Matin, and Meyer (2019) by emphasizing the development of climate-based decision-making tools for regions facing changing weather dynamics. • Additionally, this paper aligns with the findings of Zhang et al. (2018) and Monteiro et al. (2022), who demonstrated the potential of AI and machine learning for crop forecasting and agricultural management. By integrating graph convolutional neural networks (GCNNs), the proposed system offers a novel approach to modeling spatial and environmental relationships, enhancing the predictive accuracy for crop selection. This extends the work of Sharma et al. (2020) by addressing regional variations in soil and climate more effectively. • Furthermore, the system builds on ensemble learning methodologies, as highlighted by Sagi and Rokach (2018), while also refining terrain-based crop recommendations, following Shehadeh et al. (2021). By considering region- specific agricultural constraints and potentials, this research offers practical tools for sustainable farming practices, advancing the discussion initiated by Letey (2017) on the interplay between soil properties and crop productivity.
  • 6.
    RELATED WORK • Variousapplications of ML models in agriculture have been, such as crop yield prediction, weather forecasting, smart irrigation system, crop disease prediction, and deciding minimum support price (Young, L. J., 2016; Nandy and Singh, 2020; Sharma et al., 2020; Cravero and Sepulveda, 2021). Moreover, in order to achieve accurate predictions, researchers used the supervised ML algorithms for crop production prediction in (Kaur, 2016; Shehadeh et al., 2021). In addition, many researchers proposed a methodology that uses Average Pearson Correlation (APC) and Coefficient of Variance (CV) to determine indications that reveal crop price fluctuation (Pereira et al., 2021). All these methods require the dataset to be extremely clearly described, which is difficult to generate in the context of Bangladesh. • • Van et al. (2020) conducted a comprehensive review, highlighting that soil composition, temperature, and rainfall are key features often used, with artificial neural networks (ANNs) being a popular algorithm in such models. Rashid et al. (2021) explored multiple machine learning (ML) algorithms, with a focus on predicting agricultural yields, particularly for palm oil. Kalimuthu et al. (2020) utilized the Naive Bayes algorithm, while Sharma et al. (2021) provided an extensive review of ML applications in agriculture, particularly in areas like livestock productivity through machine learning and computer vision for behavioral predictions. • Cunha et al. (2018) developed a pre-season forecast model for soybean and maize, excluding NDVI data, integrating soil parameters from satellite data, climate forecasts, and rainfall information. Pande et al. (2021) built a practical ML-based system for crop yield prediction and fertilizer recommendations to boost yields. Reddy and Kumar (2021) proposed an ML-based approach to identify profitable crops and forecast yields using algorithms like SVM, ANN, RF, multivariate regression, and k-NN. Tahaseen and Moparthi (2021) demonstrated how various ML techniques can predict crop yields based on factors such as weather and temperature, with dataset availability influencing feature selection.
  • 7.
    • Sharma etal. (2021) examined methods for weed and pest detection, crop prediction, and leaf disease diagnosis, discussing the state of global agricultural yield forecasting. Ray et al. (2022) used distribution and correlation analysis to propose a model for 22 crop types, achieving an accuracy of 99.54%. Vashisht et al. (2022) applied extreme learning machines to predict rice yield based on geographical and seasonal factors. Gupta et al. (2022) emphasized the potential of ML to segment large datasets for yield prediction. • Seireg et al. (2022) utilized cascading and stacking regression to predict blueberry yield with high accuracy, while Rasheed et al. (2021) tested a decision-making tool on historical agricultural data in Pakistan to predict net profits. Pant et al. (2021) defined the use of ML techniques to identify trends in data for crop prediction. Chandraprabha et al. (2021) utilized predictive analytics for soil nutrient forecasting, while Raja et al. (2022) demonstrated that ensemble techniques can enhance yield predictions over traditional classification methods. • Cedric et al. (2022) presented a decision tree and k-NN-based ML model for forecasting crop yields in West Africa. Ali et al. (2022) employed remote sensing and statistical models to evaluate crop production. Pantazi et al. (2016) predicted wheat yield using unsupervised learning with satellite and soil data. Aghighi et al. (2018) predicted maize yield using time-series imagery from Landsat 8, introducing a modified feature selection method that outperformed others with 95% accuracy (Mariammal et al., 2021). • Kumar et al. (2021) incorporated pre-processing, exploratory data analysis (EDA), and detection modules for plant disease prediction, achieving over 98% accuracy. Ziliani et al. (2022) combined the APSIM crop model with CubeSat images to produce high-resolution yield maps. Vlachopoulos et al. (2022) determined that random forests were the best for green area index (GAI) prediction with an RMSE of 10.86%. • Goel and Mishra (2022) achieved 95.64% accuracy using deep learning for phenological data, while Elavarasan and Vincent (2020) found that Q-learning networks offered superior yield predictions. Haque et al. (2020) applied the ANN method to examine the impact of different factors on crop yield, using error rates to evaluate performance. Cunha and Silva (2020) developed a model that used weather forecasts and crop calendars to predict yields. Bose et al. (2016) employed spiking neural networks to analyze remote sensing data for crop yield prediction, achieving an accuracy of 95.64%.
  • 8.
    • Saeed andLizhi (2019) developed a deep neural network (DNN) approach to enhance prediction accuracy, while Sun et al. (2020) integrated RNN and CNN for extracting spatial and temporal features from time-series data. Qiao et al. (2021) introduced a deep learning architecture combining RNNs and 3D CNNs for crop yield forecasting from multispectral images. Kalaiarasi and Anbarasi (2022) introduced a multiple kernel DNN to enhance learning capacity for medium-scale agricultural datasets. • Abbaszadeh et al. (2022) combined deep learning networks like 3DCNN and ConvLSTM to predict soybean yield, with probabilistic outputs. Pang et al. (2020) used CNNs and hyperspectral imaging to model spectral data, comparing PCA and multidimensional scattering correction. Alebele et al. (2021) applied Gaussian kernel regression to rice yield prediction, outperforming other Bayesian methods. • Martinez et al. (2021) utilized Gaussian processes to identify climate extremes affecting crop productivity, while Qiao et al. (2021) developed a 3D convolutional neural multi-kernel network to capture hierarchical features for yield prediction. Sivanantham et al. (2022) improved accuracy by using orthogonal basis functions and quantile regression. • Li et al. (2022) focused on combining solar-induced fluorescence (SIF), satellite, and environmental data for crop yield prediction. Gupta et al. (2021) applied MapReduce architecture and K-means clustering for crop prediction based on soil and weather data. Liu et al. (2022) used MLR to predict plant diseases with 91% accuracy, while Udutalapally et al. (2021) trained CNNs to achieve 99.24% accuracy in disease prediction. Makkithaya and G. (2022) used deep residual networks for soybean prediction, while Mehta et al. (2021) compared CNN and LSTM models for crop yield forecasting. • Mopidevi et al. (2022) employed deep learning to predict Ficus stem growth, while Swarnakantha et al. (2022) evaluated comparative studies on crop development. Bhansali et al. (2022) built a recommendation model using N-P-K and rainfall data to diagnose diseases and provide treatment suggestions, while Nancy et al. (2022) developed an image-based plant disease detection system using machine learning and deep learning.
  • 9.
    METHODOLOGY Flow of theProposed System
  • 10.
    Dataset The dataset consistsof parameters like Nitrogen(N), Phosphorous(P), Potassium(K), PH value of soil, Humidity, Temperature and Rainfall. The datasets have been obtained from the Kaggle website. Features Description N Nitrogen content in the soil (kg/ha) P Phosphorus content in the soil (kg/ha) K Potassium content in the soil (kg/ha) Temperature Average temperature (°C) Humidity Average humidity (%) pH pH level of the soil Rainfall Average rainfall (mm) Label Categorical variable indicating the recommended crop
  • 11.
    Data preprocessing • Beforebuilding the model, the following preprocessing steps were applied to the dataset: • Handling Missing Data • The dataset was examined for any missing values using the crop.isnull().sum() function. This check revealed that there were no missing data points, ensuring the dataset was complete. • The data types of each feature were inspected using crop.info() to confirm that the numerical and categorical data were appropriately categorized, preventing any issues during model training.
  • 12.
  • 13.
    Normalization and Standardization •The dataset underwent normalization and standardization. First, the features were scaled using MinMaxScaler(), which transformed all feature values into a range between 0 and 1. This step is important because it ensures that features with larger ranges do not dominate the training process. After that, StandardScaler() was applied to further standardize the data. This transformation shifts the data so that it has a mean of 0 and a standard deviation of 1. This standardization helps to ensure uniformity across the dataset, which is particularly beneficial when using algorithms that assume data is normally distributed or when features have varying units or scales. This preprocessing ultimately improves the model's performance and convergence speed.
  • 14.
    Feature Correlation A correlationmatrix was generated to understand the relationships between features. The matrix showed that nitrogen (N) and phosphorus (P) had a weak negative correlation (- 0.23), while phosphorus (P) and potassium (K) exhibited a strong positive correlation (0.74), indicating that these two elements often vary together in the dataset.
  • 15.
    Feature Selection • Thisstep is focused on identifying and using the most relevant attribute from the dataset. Through this process irrelevant and redundant information is removed for the application of classifiers.In this proposed system applied different Machine Learning algorithms like Decision Tree, Naïve Bayse (NB), Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF) and XGBoost.
  • 16.
    Random Forest • Randomforest first builds new datasets from the original data. Then, the model randomly selects rows from the original data to build new datasets. The decision tree is trained on each of the bootstrapped data sets independently. The model randomly selects a subset of features for each tree and uses only them for training. Since this is a classification problem, the prediction is made by taking the majority voting of all the decision trees. This classification can also be expressed mathematically as shown: • (x)-Y) • • Where: • H(x) is the final predicted class for input x. • h(x) is the prediction of ith decision tree. • Y represents the possible classes. • || is an indicator function that equals 1 if the condition is true and 0 otherwise. • N is the total number of decision trees in the forest.
  • 17.
    Evaluation metrics • Coefficientsof determination (R2) are used as evaluation metrics for measuring the accuracy of all the models. Adjusted R2 is a statistical measure that examines how changes in one variable can be explained by a change in a second variable while predicting the outcome of an event. • The formula for R2 can be expressed as: • • • Where: • R2 is the coefficient of determination. • RSS is the Residual Sum of Squares which represents the total squared difference between the actual and predicted values. • TSS is the Total Sum of Squares which is the total squared difference between the actual values and their mean.
  • 18.
    Conclusion • Using datasets,machine learning models can reasonably accurately predict whether a crop will be profitable or not. This study used four different machine learning algorithms to recommend crops according to the weather conditions and soil nutrients. Random forest outperformed rest of the algorithms in this study with a testing accuracy R2 of about 99%. Through this work, farmers will increase the productivity of their agriculture and prevent soil degradation on cultivated land. • They will also reduce the use of chemicals in crop production and make better use of water resources. Further research can be conducted by considering more varieties of crops in future. The current research focuses on twenty- two crops due to the limited availability of data. In future studies, soil fertility data could be assessed by considering more granular geographical conditions, based on micro nutrients data like sulfur, zinc, iron, manganese, etc. Also, a machine learning framework can be built which could recommend optimum amounts of pesticides and fertilizers to be used for a particular crop. By doing so, the production of quality crops and the profits of farmers can be increased.
  • 19.
    References 1. Stekhoven, D.J., and Buhlmann, P. (2012). MissForest—nonparametric missing value imputation for mixed-type data. 2. Sujjaviriyasup, T., and Pitiruek, K. (2013). Agricultural product forecasting using a machine learning approach. 3. Tavares, O. C. H., Santos, L. A., Filho, D. F., Ferreira, L. M., Garcia, A. C., Castro, T. A. V. T., et al. (2021). Response surface modeling of humic acid stimulation of the rice(Oryza sativa L.) root system. Arch. Agron. 4. Uddin, K., Matin, M. A., and Meyer, F. J. (2019). Operational flood mapping using multitemporal Sentinel-1 SAR images: A case study from Bangladesh. Remote Sensing. Van Ittersum, M. K., Cassman, K. G., Grassini, P., Wolf, J., Tittonell, P., and Hochman, Z. (2013). Yield gap analysis with local to global relevance—a review. 5. Van Klompenburg, T., Kassahun, A., and Catal, C. (2020). Crop yield prediction using machine learning. 6. Sharma, R., Kamble, S. S., Gunasekaran, A., Kumar, V., and Kumar, A. (2020). A systematic literature review on machine learning applications for sustainable agriculture supply chain performance. 7. Shehadeh, A., Alshboul, O., Al Mamlook, R. E., and Hamedat, O. (2021). Machine learning models for predicting the residual value of heavy construction equipment: An evaluation of modified decision tree, LightGBM, and XGBoost regression. Automation Construction. 8. Siddique, M. N. E. A., de Bruyn, L. A. L., Osanai, Y., and Guppy, C. N. (2022). Typology of rice-based cropping systems for improved soil carbon management. 9. P., Altman, D. G., and Sauerbrei, W. (2016). Dichotomizing continuous predictors in multiple regression: a bad idea. 10. Sagi, O., and Rokach, L. (2018). Ensemble learning: A survey. Wiley Interdiscip. Reviews: Data Min. Knowledge Discovery. 11. Sarker, M. A. R., Alam, K., and Gow, J. (2019). Performance of rain-fed Aman rice yield in Bangladesh in the presence of climate change. Renewable Agric. Food systems.