As part of the GSP’s capacity development and improvement programme, FAO/GSP have organised a one week training in Izmir, Turkey. The main goal of the training was to increase the capacity of Turkey on digital soil mapping, new approaches on data collection, data processing and modelling of soil organic carbon. This 5 day training is titled ‘’Training on Digital Soil Organic Carbon Mapping’’ was held in IARTC - International Agricultural Research and Education Center in Menemen, Izmir on 20-25 August, 2017.
1. Yusuf YIGINI, PhD - FAO, Land and Water Division (CBL)
GSP - Eurasian Soil
Partnership - Dijital
Toprak Haritalama ve
Modelleme Egitimi
Izmir, Turkiye
21-25 Agustos 2017
3. Cubist Model
Cubist is a prediction-oriented regression model
that combines the ideas in Quinlan (1992) and
Quinlan (1993).
4. Cubist Model
A tree is grown where the terminal leaves contain
linear regression models. These models are based
on the predictors used in previous splits. Also,
there are intermediate linear models at each step
of the tree. A prediction is made using the linear
regression model at the terminal node of the tree.
https://cran.r-project.org/web/packages/Cubist/vignettes/cubist.pdf
5. Cubist Model
The Cubist model first partitions the data into
subsets within which their characteristics are
similar with respect to the target variable and the
covariates. A series of rules (a decision tree
structure may also be defined if requested) defines
the partitions, and these rules are arranged in a
hierarchy.
Using R for Digital Soil Mapping - McBratney et al, 2016
6. Cubist Model
The Cubist model first partitions the data into
subsets within which their characteristics are
similar with respect to the target variable and the
covariates. A series of rules (a decision tree
structure may also be defined if requested) defines
the partitions, and these rules are arranged in a
hierarchy.
Using R for Digital Soil Mapping - McBratney et al, 2016
Each rule takes the form:
if [condition is true]
then [regress]
else
[apply the next rule]
7. Cubist Model
In R fitting a Cubist model is relatively easy —
although it will be useful to spend some time
playing around with many of the controllable
parameters that the function has
Using R for Digital Soil Mapping - McBratney et al, 2016
8. Cubist Model
In terms of specifying the target variable and covariates, we
do not define a formula as we did earlier for the MLR
model. Rather we specify the columns explicitly—those that
are the target variable (x), and those that are the covariates
(y).
Using R for Digital Soil Mapping - McBratney et al, 2016
> library(Cubist)
Loading required package: lattice
9. Cubist Model
In terms of specifying the target variable and covariates, we
do not define a formula as we did earlier for the MLR
model. Rather we specify the columns explicitly—those that
are the target variable (x), and those that are the covariates
(y).
Using R for Digital Soil Mapping - McBratney et al, 2016
library(Cubist)
Loading required package: lattice
trainingset <- sample(nrow(DSM_table2), 0.7 * nrow(DSM_table2))
mdata <-DSM_table2[training, ]
10. Cubist Model
In the example below we can control the number of
potential rules that could potentially partition the data
Using R for Digital Soil Mapping - McBratney et al, 2016
library(Cubist)
Loading required package: lattice
trainingset <- sample(nrow(DSM_table2), 0.7 * nrow(DSM_table2))
mdata <-DSM_table2[training, ]
11. Cubist Model
Now we can fit the model!
Using R for Digital Soil Mapping - McBratney et al, 2016
ModelC <- cubist(x = mdata[, c("dem", "twi", "slp", "prec", "tmpn",
"tmpd")], y = mdata$Value, cubistControl(rules = 5, extrapolation =
5),committees = 1)
PredictedC <- predict(ModelC, newdata = DSM_table2[training, ])
12. Cubist Model
The output from fitting a Cubist model can be retrieved
using the summary function. This provides information
about the conditions for each rule, the regression model for
each rule, and information about the diagnostics of the
model fit, plus the frequency of which the covariates were
used as conditions and/or within a model.
Using R for Digital Soil Mapping - McBratney et al, 2016
ModelC <- cubist(x = mdata[, c("dem", "twi", "slp", "prec", "tmpn",
"tmpd")], y = mdata$Value, cubistControl(rules = 5, extrapolation =
5),committees = 1)
13. Cubist Model
The output from fitting a Cubist model can be retrieved
using the summary function. This provides information
about the conditions for each rule, the regression model for
each rule, and information about the diagnostics of the
model fit, plus the frequency of which the covariates were
used as conditions and/or within a model.
Using R for Digital Soil Mapping - B P Malone et al, 2016
14. Cubist Model
Using R for Digital Soil Mapping - McBratney et al, 2016
> summary(ModelC)
Call: cubist.default(x = mdata[, c("dem", "twi", "slp", "prec", "tmpn",
"tmpd")], y = mdata$Value, committees = 1, control
= cubistControl(rules = 5, extrapolation = 5))
Rule 1: [858 cases, mean 1.3514910, range 0.07426247 to 5.765974, est
err 0.5042074]
if
dem > 576
tmpd > 291
then
outcome = 12.3277068 + 0.024 slp - 0.04 tmpn
15. Cubist Model
Using R for Digital Soil Mapping - McBratney et al, 2016
Rule 2: [767 cases, mean 1.5268759, range 0 to 6.617897, est err
0.5753750]
if
dem <= 576
tmpd > 291
then
outcome = -1.5564033 + 0.00182 dem + 0.0024 prec + 0.011 twi
Rule 3: [437 cases, mean 2.1162884, range 0 to 7.916972, est err
0.7406893]
if
dem > 425
dem <= 1120
tmpd <= 291
then
outcome = 64.0021631 - 0.215 tmpd + 0.02 slp
16. Cubist Model
Using R for Digital Soil Mapping - McBratney et al, 2016
Rule 4: [228 cases, mean 3.9560454, range 0 to 13.29358, est err
1.3056889]
if
dem > 1120
tmpd <= 291
then
outcome = 24.2396342 + 0.00185 dem - 0.079 tmpd - 0.006 twi
Rule 5: [20 cases, mean 10.2624750, range 0.9238458 to 50.33235, est
err 9.4497499]
if
dem <= 425
tmpd <= 291
then
outcome = 2.414286
17. Cubist Model
Using R for Digital Soil Mapping - McBratney et al, 2016
> RMSE <- sqrt(mean((mdata$Value - PredictedC)^2))
> RMSE
[1] 1.915229
> bias <- mean(PredictedC) - mean(mdata$Value)
> bias
[1] -0.2119047
Lets see how well it validates.
18. Cubist Model
Using R for Digital Soil Mapping - McBratney et al, 2016
MapSOCC <- predict(covStack, PredictedC, "carbonMC_Cubist.tif",
format = "GTiff", datatype = "FLT4S", overwrite = TRUE)
Creating the map resulting from the
PredictedC model can be implemented as
before (random forest) using the raster predict
function
19. Cubist Model
Using R for Digital Soil Mapping - McBratney et al, 2016
MapSOCC <- predict(covStack, PredictedC, "carbonMC_Cubist.tif",
format = "GTiff", datatype = "FLT4S", overwrite = TRUE)
Creating the map resulting from the
edge.cub.Exp model can be implemented as
before (randomfrorest) using the raster predict
function