Introduction
• Generalized management recommendations that need to be refined
• Rapid development and utilization of infra-Red spectra technology,
libraries growing, potential for cheap assessments
• Good predictions of soil parameters from soil spectra
• Similarly, revolution in statistical approaches for data mining
• These present opportunity to take, scan a soil sample and predict
yield
• On-going and pioneering work, future to inform on specific nutrients
a farmer can apply, and cheaply.
• No known direct linkage between crop response and soil spectra
• Low productivity, high yield gap
among smallholder farmers
• Majority cannot afford conventional
soil testing
Key objectives
• Determine variance in yield from spectra alone and
from soil parameters alone
• Determine additional variance in yield explained by
topographic and weather variables
• Find out to what extent soil spectra can predict
response to fertilizer.
5http://afsis-dt.ciat.cgiar.org/
No. Responses
1 Control/ unfertilized yield
2 NPK (100kg N, 30 kg P and 60 kg
K/ha) yield
3 Change in yield (NPK-control)
Kiberashi, Tanzania. Fertile
site newly converted;
29 plots
Mbinga, Tanzania. Bread
basket for the country;
35 plots
Nkhata Bay, Malawi. Acidity
problems in some parts
21 plots
Soil and topographic characterization
₊
Soil infra-red spectra (0-20 cm depth) Aster DEM-derived topographic covariates (30m)
(slope, elevation, flow accumulation, wetness
index, aspect)
Seasonal weather
• Total seasonal rainfall
• Number of rainy days
Statistical approaches
• Samples selected ~67% of
total
• Remainder=out-of-bag
samples
• Variables selected =sqrt(total
no. of variables)
Source: Touw et al. 2013:
Briefings in Bioinformatics;
14(3): 315–326.
• Principal component analysis and OLS
• Partial least squares regression
• Random forest
Contribution to variance
explained by spectra, and
by additional covariates
Ordinary least squares regression
• Principal components
extracted from soil
spectra
• Collinearity between
additional covariates
checked
• Important variables
identified
• OLS model created
• 3-fold cross-validation
undertaken
Cross-validated R2= 0.67
RMSEP= 1.29 t/ha
Maize grain yield (Control)~ 10 PCs + Seasonal Rainfall
Partial least squares regression
• Principal components
extracted from soil
spectra, maximizing
variations in yield
• Contribution of
spectra alone revealed
• Important covariates
added in an OLS
model
• 3-fold cross-validation
undertaken
Cross-validated R2= 0.60
RMSEP= 1.41 t/ha
Maize grain yield~Comp1+Comp2+Flow Accumulation, with 3-fold
cross-validation
Further results with partial least squares
regression and random forest
Variance explained RMSE No. of PCs
Partial least squares regression
Control grain yield 60.5 1.7 2
NPK grain Yield 65.8 1.39 10
Response 30.6 1.8 2
Random forest
Control grain yield 62.1 1.65 18
NPK grain Yield 49.4 1.43 21
• Important additional covariates: Flow accumulation, slope and
elevation, explaining only 1-2% additional variance in yield
• Response to yield is varied, low at poor and fertile fields
Refining predictions
• ~7% more variance in yield explained with
spectra than soil parameters
• Topographic and climate data explain more
variance in yield with soil parameters than
with spectra
• Need to expand the initial library for spectra-
crop response linkages
• Include additional depths in spectra analysis,
and depth restrictions