County-Level Corn Yield Prediction with GeoAI.pdf

County-Level Corn Yield Prediction with
Deep Learning
Yuchi Ma, Ph.D.
Earth Systems Science
Stanford University
August 2023

Outline
• Introduction
• Materials
• Research Activities (RAs)
• Bayesian Neural Network for Corn Yield Prediction and Uncertainty
Analysis
• Adversarial Domain Adaptation on Corn Yield Prediction
2

Introduction – Background
Credit: UN, USDA
3
• Corn
• The U.S. is the biggest corn producer and exporter.
• Corn is the most valuable crop in America.
• Corn plays a significant role in American diet and food
industries.
• Corn Yield Prediction
• Food security monitoring
• Food market planning
• Farm resource management
• Risk assessment

Introduction – Background
Credit: USDA
4
• United States Department of Agriculture National Agricultural
Statistics Service (USDA-NASS)
• Publish monthly corn yield prediction at the state level from August to November
through the NASS Crop Production Progress Report.
• Publish corn yield statistics at the county level in February of the next year through
the NASS Quick Stats.
• Limitations:
• Large-scale field surveys are required, which is labor-intensive and time-
consuming
• Low spatial and temporal resolution
• Cannot meet the needs for in-season county-level yield prediction

Introduction – Remote Sensing + Machine Learning
5
Machine learning
Model
Ridge Regression, Random
Forests, Neural Networks …
Multi-Source Input
Predictors
Vegetation Indices, Weather
variables …
Predicted Yield
Traditional Machine Learning (ML)
• Associate input predictors X with response variable y
• No need for explicit programming or knowledges for
physiological mechanisms on individual crops.
• Ridge Regression, Random Forest (RF), etc.
Deep Learning (DL)
• Simulate neurons in human brain
• Automatic feature learning
• Multiple Layer Perceptron (MLP), Recurrent
Neural Network (RNN), etc.

Introduction – Challenges
# How to quantify predictive uncertainty and increase robustness for
supervised deep learning models on corn yield prediction?
6
Bottlenecks of point-estimate neural networks:
➢ A huge amount of training data is necessary
➢ Prone to overfitting
➢ Lack of uncertainty information
Point estimates

Introduction – Challenges
# How to improve DL models’ spatial transferability through unsupervised
domain adaptation strategies?
7
Bottlenecks of supervised machine learning:
➢ Untrainable for regions without reference yield records
➢ DL models are region-specific (or domain-specific)
➢ Lack of transferability due to domain shift
Data-abundant
source region/domain
Data-scarce
target region/domain
Domain Shift

Materials – Study Area
• U.S. Corn Belt
• The top corn-producing region
• Abundant county-level yield records
• Yield records from USDA NASS
8

Materials – Feature Variables
9
Category Variables Unit
Related
properties
Spatial
Resolution
Temporal
Resolution
Source Latency
Satellite
Imagery
Enhanced Vegetation Index (EVI)
Plant vigor 500 m Daily
MODIS One day
Green Chlorophyll Index (GCI)
Normalized Difference Water Index (NDWI)
Daytime Land Surface Temperature (LSTday)
Kelvin
Heat stress
1 km Daily
Nighttime Land Surface Temperature
(LSTnight)
Climate
Mean Temperature (Tmean)
°C
4 km Daily PRISM One day
Maximum Temperature (Tmax)
Minimum Temperature (Tmin)
Total Precipitation (PPT) (mm) mm
Water stress
Maximum Vapor Pressure Deficit (VPDmax)
hPa
Minimum Vapor Pressure Deficit (VPDmin)
GLDAS Water Stress (GLDASws) 0.25 arc
degree
3-hourly GLDAS
One
month
Evapotranspiration (ET) mm
Soil
Available Water Holding Capacity (AWC) cm
Soil water
uptake
30 m N/A SSURGO N/A
Soil Organic Matter (SOM) kg/m2 Soil nutrient
uptake
Cation Exchange Capacity (CEC) cmol/kg
Others
Year
County-level N/A
USDA
NASS
N/A
Historical average yield t/ha

Materials – Data Preprocessing
10
Start
MODIS PRISM GLDAS
…
Spatial Filtering
Spatial Aggregation
Temporal Aggregation
Pair with Yield Records
End
• Spatial Filtering
• Filtering out feature variables on non-corn areas
• Spatial Aggregation
• Aggregate feature variables into the county level
by calculating the mean value in each county
• Make sure all feature variables have the same
spatial resolution
• Temporal Aggregation
• Aggregate time-series feature variables into the
16-day interval by calculating the mean value in
each 16-day time period.

RA-1: Methodology
11
Traditional Neural Networks:
• Point estimates of weights
• Output only predicted values
• Prone to overfitting
Bayesian Neural Networks:
• Estimate posterior distributions of weights
• Output both target values and uncertainty
• A special form of ensemble learning →
Robust to overfitting
Traditional vs Bayesian Neural Network
Point estimates Posterior estimates

RA-1: Methodology
The proposed BNN model:
• A feature extraction net to extract features from input predictors
• Two independent nets, the yield net and the uncertainty net, to predict the mean ො
𝑦
and the standard deviation ො
𝜎 for the predictive yield distribution.
• The uncertainty net outputs ො
𝜎 as a measurement of predictive uncertainty.
12
Ma et al. 2021a

RA-1: Experimental Setup
• Experimental years: 2001-2019
• Testing years: 2010-2019
• Training strategy: using one year for testing and all preceding years for training
• Comparison methods:
• Ridge Linear Regression (Ridge)
• Random Forest (RF)
• Support Vector Regressor (SVR)
• Multilayer Perceptron (MLP)
• Recurrent Neural Network with Long Short-Term Memory (LSTM) cell
• Evaluation:
• End-of-season: using all non-sequential and sequential predictors and making
predictions on Oct 4th
• In-season: using all non-sequential and part of sequential predictors and
making predictions during the growing season 13

RA-1: Evaluation Results
14
TEST
YEAR
RIDGE RF SVR MLP LSTM BNN
RMSE R2 RMSE R2 RMSE R2 RMSE R2 RMSE R2 RMSE R2
2010 1.19 0.62 1.13 0.66 1.15 0.65 1.22 0.61 1.16 0.64 1.05 0.72
2011 1.31 0.67 1.22 0.71 1.28 0.68 1.27 0.69 1.35 0.63 1.14 0.73
2012 2.07 0.47 1.89 0.54 1.73 0.59 1.69 0.60 1.57 0.64 1.41 0.70
2013 1.44 0.55 1.24 0.67 1.42 0.56 1.36 0.63 1.06 0.74 1.13 0.72
2014 1.37 0.54 1.18 0.66 1.26 0.61 1.13 0.69 1.09 0.71 0.86 0.82
2015 1.52 0.43 1.14 0.67 1.33 0.56 1.06 0.72 1.08 0.72 1.01 0.75
2016 1.36 0.54 1.18 0.65 1.12 0.68 1.09 0.70 1.04 0.72 0.85 0.82
2017 1.35 0.64 1.16 0.74 1.14 0.75 1.15 0.75 1.03 0.79 0.94 0.83
2018 1.41 0.65 1.17 0.76 1.38 0.67 1.14 0.77 1.12 0.78 0.98 0.82
2019 1.63 0.24 1.34 0.48 1.28 0.50 1.14 0.63 1.07 0.65 0.92 0.76
AVE 1.47 0.54 1.27 0.65 1.31 0.63 1.23 0.68 1.16 0.70 1.03 0.77
Table 1.1. Evaluation results of RMSE (t/ha) and R2 for end-of-season prediction on Oct 4th.

In-season evaluation:
• Poor performance during the early season (i.e., before July)
• Performance increases as more informative features included
• Optimal performance achieved in August, which is two month before the harvest
15
Ma et al. 2021a

RA-1: Absolute Relative Error Maps
16

RA-1: Conclusion
RA-1: Bayesian Neural Network for Corn Yield Prediction and
Uncertainty Analysis
• The proposed BNN model outperforms other ML&DL models.
• Accurate corn yield prediction can be made in August, which is two
month ahead of the harvest.
• The predictive uncertainty have strong correlation with the prediction
error.
18

RA-2: Background
19
Bottlenecks of supervised ML&DL:
• Labeled data are required for model training → impossible to train
models for regions without yield records.
• Due to domain shift, supervised learning models tend to be location-
specific → models trained in the label-rich region (source domain)
would lose the validity when directly applied to the label-scarce region
(target domain).

RA-2: Background
20
Fine-tuning-based transfer learning:
• Idea: Pretrain a source model with abundant labeled source samples;
Fine-tune the source model with a few labeled target samples.
• Issue: A limited number of labeled target samples are still needed.

RA-2: Methodology
21
Unsupervised Domain Adaptation (UDA):
• Idea: Reduce domain shift by aligning feature distributions in the
source domain and the target domain.
• Advantage: No need of labeled samples from the target domain.
Source Domain Target Domain
UDA
Source Domain
Predictor
Cross-Domain
Predictor
Misprediction
Misprediction
UDA
Ma and Zhang. 2022
:High yield samples
:Low yield samples

RA-2: Methodology
Domain Adversarial Neural Network (DANN)
• Feature Extractor 𝐺𝑓: extract task-informative and domain-invariant features
• Yield Predictor 𝐺𝑦: make yield predictions
• Domain Classifier 𝐺𝑑: classify the domain labels
• Gradient Reversal Layer (GRL): Reverse the gradient during backpropagation
22
Ma et al. 2021b

RA-2: Methodology
• Training objectives:
1. Collaboratively train the feature extractor 𝐺𝑓 and the yield predictor 𝐺𝑦 to
minimize the yield prediction loss 𝐿𝑦 → extract task-informative features
(Eq. (2.2))
2. Adversarially train the feature extractor 𝐺𝑓 and the domain classifier 𝐺𝑑 via
GRL to maximize the domain loss 𝐿𝑑 → extract domain-invariant features
(Eq. (2.3))
• Loss function:
𝐿 = 𝐿𝑦 − 𝜆𝐿𝑑 (2.1)
𝐿𝑦 =
1
𝑛𝑆
σ𝑖=1
𝑛𝑆
𝑦𝑖 − ො
𝑦𝑖
2 (2.2)
𝐿𝑑 = −
1
𝑛𝑆+𝑛𝑇
σ𝑖=1
𝑛𝑆+𝑛𝑇
𝑑𝑖 log መ
𝑑𝑖 + (1 − 𝑑𝑖) log 1 − መ
𝑑𝑖 (2.3)
23
𝜆: weighting parameter

RA-2: Methodology
24
Adaptative Domain Adversarial Neural Network (ADANN)
• Issue: the yield prediction loss (in the form of MSE) can have a quite different
magnitude with the domain classification loss (in the form of Cross Entropy).
• Solution: the weighting parameter 𝜆𝑖 is adaptatively determined according to the
learning progress 𝑝𝑖 and the normalization term 𝑟𝑖:
𝑝𝑖 =
𝑖
𝑁
(2.4)
𝑟𝑖 =
𝐿𝑦𝑖
𝐿𝑑𝑖
(2.5)
𝜆𝑖 = 𝑟𝑖(
2
1+exp −𝑝𝑖
− 1) (2.6)

RA-2: Methodology
25
Bayesian Domain Adversarial Neural Network (BDANN)
• Issue: Many source samples is required to train a reliable ADANN.
• Solution: Applying Bayesian Inference and proposed BDANN with the purpose to
improve the model performance on small training datasets.

Study area - Two ecological regions based on Environmental
Protection Agency (EPA):
• Eastern Temperate Forests (ETF)
• Warm, humid, and temperate
• Mild winters and humid summers
• High plant biodiversity
• Great Plains (GP)
• Semiarid with high winds
• Harsh winters and hot summers
• Lack of forests and rainfall
26
Ma et al. 2021a

• Experimental years: 2006-2019
• Testing years: 2016-2019
• Training strategy: using one year for testing and all preceding years
for training
• Comparison methods:
• Random Forest (RF)
• Full-connected Deep Neural Network (DNN)
• DANN with a fixed weighting parameter
• Evaluation:
• GP → ETF: models are trained in GP and tested in ETF.
• ETF → GP: models are trained in ETF and tested in GP.
27

28
Table 2.1. Evaluation results for transfer experiments.
Year Case RF DNN DANN ADANN BDANN
RMSE R2 RMSE R2 RMSE R2 RMSE R2 RMSE R2
2016
GP→ETF 1.26 0.49 1.14 0.58 1.12 0.60 0.96 0.70 0.90 0.74
ETF→GP 1.70 0.28 1.63 0.33 1.50 0.44 1.16 0.66 0.96 0.76
2017
GP→ETF 1.01 0.56 1.15 0.45 0.88 0.58 0.84 0.60 0.83 0.61
ETF→GP 1.83 0.47 1.45 0.67 1.38 0.70 1.06 0.82 1.20 0.77
2018
GP→ETF 1.10 0.57 1.22 0.47 1.17 0.51 0.98 0.66 0.95 0.67
ETF→GP 1.68 0.45 1.75 0.41 1.42 0.61 1.40 0.62 1.18 0.72
2019
GP→ETF 1.23 0.26 1.27 0.20 1.08 0.42 0.96 0.49 0.88 0.56
ETF→GP 1.43 0.59 1.21 0.72 1.13 0.76 1.07 0.78 1.07 0.78

0.5 1.0 1.5 2.0
0 250 500
Absolute Error (t/ha)
km
RF
RA-2: Absolute Error Maps
DNN DANN ADANN BDANN
Experiment: GP → ETF
Experiment: ETF → GP

RA-2: Conclusion
RA-2: Adversarial Domain Adaptation on Corn Yield Prediction
1. The UDA strategy was used for corn yield prediction.
2. Two adversarial domain adaptation models were proposed for county-level corn
yield prediction.
3. The proposed ADANN and BDANN outperformed RF, DNN, and DANN with
better spatial transferability across spatial domains.
4. The BDANN model could generalize well on small training sets.
30

Selected Publications – Related
RA-1:
• Ma, Y., Zhang, Z., Kang, Y. and Özdoğan, M., 2021a. Corn yield prediction and uncertainty analysis based
on remotely sensed variables using a Bayesian neural network approach. Remote Sensing of Environment,
259, p.112408.
• Chen, S., Liu, W., Feng, P., Ye, T., Ma, Y. and Zhang, Z., 2022. Improving Spatial Disaggregation of Crop
Yield by Incorporating Machine Learning with Multisource Data: A Case Study of Chinese Maize Yield.
Remote Sensing, 14(10), p.2340.
RA-2:
• Ma, Y., Zhang, Z., Yang, H.L. and Yang, Z., 2021b. An adaptive adversarial domain adaptation approach for
corn yield prediction. Computers and Electronics in Agriculture, 187, p.106314.
• Ma, Y. and Zhang, Z., 2022. A Bayesian Domain Adversarial Neural Network for Corn Yield Prediction.
IEEE Geoscience and Remote Sensing Letters, 19, pp 1-5.
31

Thank you !
Q&A
Yuchi Ma (yuchima@stanford.edu)
32

County-Level Corn Yield Prediction with GeoAI.pdf

Recommended

Recommended

More Related Content

Similar to County-Level Corn Yield Prediction with GeoAI.pdf

Similar to County-Level Corn Yield Prediction with GeoAI.pdf (20)

Recently uploaded

Recently uploaded (20)

County-Level Corn Yield Prediction with GeoAI.pdf