GSP - Asian Soil
Partnership
Training Workshop on
Soil Organic Carbon
Mapping
Bangkok, Thailand,
24-29 April 2017
Yusuf YIGINI, PhD - FAO, Land and Water Division (CBL)
DAY 5 – 28 April 2017
TIME TOPIC INSTRUCTORS
8:30 - 10:30 Exploratory Data Analysis
Hands-on: Basic Spatial Operations
Dr. Yusuf Yigini, FAO
Dr. Ate Poortinga
Dr. Lucrezia Caon, FAO
10:30 - 11:00 COFFEE BREAK
11:00 - 13:00 Linear Models
Hands-on: Linear Models
13:00 - 14:00 LUNCH
14:00 - 16:00 Modelling Soil Properties
R - Spatial Multiple Linear Regression
R - Random Forests
16:00- 16:30 COFFEE BREAK
16:30 - 17:30 Hands-on
Random Forest
Random Forest
An increasingly popular data mining algorithm in
DSM and soil sciences, and even in applied sciences
in general is the Random Forests model. This
algorithm is provided in the randomForest package
and can be used for both regression and
classification.
Random Forest
Random Forests are a boosted decision tree model.
Fitting a Random Forest model in R is relatively
straightforward. It is better consulting the rhelp
files regarding the randomForest package and the
functions.
Random Forest
We will use the randomForest() function and a couple of
extractor functions to tease out some of the model fitting
diagnostics. We will use the sample() function to
randomly split the data into two parts: training and
testing.
> DSM_table2 <- read.csv("DSM_table2.csv")
> training <- sample(nrow(DSM_table2), 0.7 * nrow(DSM_table2))
> modelF <- randomForest(Value ~ dem + twi + slp + tmpd + tmpn, data =
DSM_table2[training, ],importance = TRUE, ntree = 1000)
Random Forest
The print function is to quickly assess the model fit.
print(modelF)
Call:
randomForest(formula = Value ~ dem + twi + slp + tmpd + tmpn,
data = DSM_table2[training, ], importance = TRUE, ntree = 1000)
Type of random forest: regression
Number of trees: 1000
No. of variables tried at each split: 1
Mean of squared residuals: 1.801046
% Var explained: 59.35
Random Forest
Generally, we confront this question by comparing
observed values with their predictions. Some of the more
common “quality” measures are the root mean square
error (RMSE), bias, and the R2 value
> Predicted <- predict(modelF, newdata =
DSM_table2[-training, ])
> RMSE <- sqrt(mean((DSM_table2$Value[-training] - Predicted)^2))
> RMSE
[1] 1.249491
> lm <- lm(Predicted~ DSM_table2$Value[-training])
> summary(lm)[["r.squared"]]
[1] 0.6079515
> bias <- mean(Predicted) - mean(DSM_table2$Value[-training])
> bias
[1] 0.01450241
Random Forest
plot(DSM_table2$Value[-training],Predicted)
abline(a=0,b=1,lty=2, col="red")
abline(lm, col="blue")
Random Forest
plot(DSM_table2$Value[-training],Predicted)
abline(a=0,b=1,lty=2, col="red")
abline(lm, col="blue")
regression on predicted and observed values - blue
1:1 comparison - red
Final Steps - Random Forest
Covs <- list.files(path = "C:/mc/covs", pattern = ".tif$",full.names
= TRUE)
covStack <- stack(Covs)
MapSoc <- predict(covStack, modelF, "SOCMAPofMAcedonia", format =
"GTiff", datatype = "FLT4S", overwrite = TRUE)
plot(MapSoc, main = "Random Forest model predicted 0-30cm SOC Map of
Macedonia %")
Final Steps - Random Forest
Covs <- list.files(path = "C:/mc/covs", pattern = ".tif$",full.names
= TRUE)
covStack <- stack(Covs)
MapSoc <- predict(covStack, modelF, "SOCMAPofMAcedonia", format =
"GTiff", datatype = "FLT4S", overwrite = TRUE)
plot(MapSoc, main = "Random Forest model predicted 0-30cm SOC Map of
Macedonia %")
Random Forest model predicted 0-30cm SOC Map
Exercise
Produce and plot the Soil Organic Carbon Map
Using your Multiple Linear Regression Model.
And share it here, https://goo.gl/jKxHfN

12. Random Forest

  • 1.
    GSP - AsianSoil Partnership Training Workshop on Soil Organic Carbon Mapping Bangkok, Thailand, 24-29 April 2017 Yusuf YIGINI, PhD - FAO, Land and Water Division (CBL)
  • 2.
    DAY 5 –28 April 2017 TIME TOPIC INSTRUCTORS 8:30 - 10:30 Exploratory Data Analysis Hands-on: Basic Spatial Operations Dr. Yusuf Yigini, FAO Dr. Ate Poortinga Dr. Lucrezia Caon, FAO 10:30 - 11:00 COFFEE BREAK 11:00 - 13:00 Linear Models Hands-on: Linear Models 13:00 - 14:00 LUNCH 14:00 - 16:00 Modelling Soil Properties R - Spatial Multiple Linear Regression R - Random Forests 16:00- 16:30 COFFEE BREAK 16:30 - 17:30 Hands-on
  • 3.
  • 4.
    Random Forest An increasinglypopular data mining algorithm in DSM and soil sciences, and even in applied sciences in general is the Random Forests model. This algorithm is provided in the randomForest package and can be used for both regression and classification.
  • 5.
    Random Forest Random Forestsare a boosted decision tree model. Fitting a Random Forest model in R is relatively straightforward. It is better consulting the rhelp files regarding the randomForest package and the functions.
  • 6.
    Random Forest We willuse the randomForest() function and a couple of extractor functions to tease out some of the model fitting diagnostics. We will use the sample() function to randomly split the data into two parts: training and testing. > DSM_table2 <- read.csv("DSM_table2.csv") > training <- sample(nrow(DSM_table2), 0.7 * nrow(DSM_table2)) > modelF <- randomForest(Value ~ dem + twi + slp + tmpd + tmpn, data = DSM_table2[training, ],importance = TRUE, ntree = 1000)
  • 7.
    Random Forest The printfunction is to quickly assess the model fit. print(modelF) Call: randomForest(formula = Value ~ dem + twi + slp + tmpd + tmpn, data = DSM_table2[training, ], importance = TRUE, ntree = 1000) Type of random forest: regression Number of trees: 1000 No. of variables tried at each split: 1 Mean of squared residuals: 1.801046 % Var explained: 59.35
  • 8.
    Random Forest Generally, weconfront this question by comparing observed values with their predictions. Some of the more common “quality” measures are the root mean square error (RMSE), bias, and the R2 value > Predicted <- predict(modelF, newdata = DSM_table2[-training, ]) > RMSE <- sqrt(mean((DSM_table2$Value[-training] - Predicted)^2)) > RMSE [1] 1.249491 > lm <- lm(Predicted~ DSM_table2$Value[-training]) > summary(lm)[["r.squared"]] [1] 0.6079515 > bias <- mean(Predicted) - mean(DSM_table2$Value[-training]) > bias [1] 0.01450241
  • 9.
  • 10.
    Random Forest plot(DSM_table2$Value[-training],Predicted) abline(a=0,b=1,lty=2, col="red") abline(lm,col="blue") regression on predicted and observed values - blue 1:1 comparison - red
  • 11.
    Final Steps -Random Forest Covs <- list.files(path = "C:/mc/covs", pattern = ".tif$",full.names = TRUE) covStack <- stack(Covs) MapSoc <- predict(covStack, modelF, "SOCMAPofMAcedonia", format = "GTiff", datatype = "FLT4S", overwrite = TRUE) plot(MapSoc, main = "Random Forest model predicted 0-30cm SOC Map of Macedonia %")
  • 12.
    Final Steps -Random Forest Covs <- list.files(path = "C:/mc/covs", pattern = ".tif$",full.names = TRUE) covStack <- stack(Covs) MapSoc <- predict(covStack, modelF, "SOCMAPofMAcedonia", format = "GTiff", datatype = "FLT4S", overwrite = TRUE) plot(MapSoc, main = "Random Forest model predicted 0-30cm SOC Map of Macedonia %") Random Forest model predicted 0-30cm SOC Map
  • 13.
    Exercise Produce and plotthe Soil Organic Carbon Map Using your Multiple Linear Regression Model. And share it here, https://goo.gl/jKxHfN