SlideShare a Scribd company logo
1 of 80
Download to read offline
A comparison of learning methods to predict N2O
fluxes and N leaching
Nathalie Villa-Vialaneix
nathalie.villa@toulouse.inra.fr
http://www.nathalievilla.org
Workshop on N2O meta-modelling, March 9th, 2015
INRA, Toulouse
Nathalie Villa-Vialaneix | Comparison of metamodels 1/37
Sommaire
1 DNDC-Europe model description
2 Methodology
Underfitting / Overfitting
Consistency
Problem at stake
3 Presentation of the different methods
4 Results
Nathalie Villa-Vialaneix | Comparison of metamodels 2/37
Sommaire
1 DNDC-Europe model description
2 Methodology
Underfitting / Overfitting
Consistency
Problem at stake
3 Presentation of the different methods
4 Results
Nathalie Villa-Vialaneix | Comparison of metamodels 3/37
General overview
Modern issues in agriculture
fight against the food crisis;
while preserving environments.
Nathalie Villa-Vialaneix | Comparison of metamodels 4/37
General overview
Modern issues in agriculture
fight against the food crisis;
while preserving environments.
EC needs simulation tools to
link the direct aids with the respect of standards ensuring proper
management;
quantify the environmental impact of European policies (“Cross
Compliance”).
Nathalie Villa-Vialaneix | Comparison of metamodels 4/37
Cross Compliance Assessment Tool
DNDC is a biogeochemical model.
Nathalie Villa-Vialaneix | Comparison of metamodels 5/37
Zoom on DNDC-EUROPE
Nathalie Villa-Vialaneix | Comparison of metamodels 6/37
Moving from DNDC-Europe to metamodeling
Needs for metamodeling
easier integration into CCAT
faster execution and responding scenario analysis
Nathalie Villa-Vialaneix | Comparison of metamodels 7/37
Moving from DNDC-Europe to metamodeling
Needs for metamodeling
easier integration into CCAT
faster execution and responding scenario analysis
Nathalie Villa-Vialaneix | Comparison of metamodels 7/37
Data [Villa-Vialaneix et al., 2012]
Data extracted from the biogeochemical simulator DNDC-EUROPE: ∼
19 000 HSMU (Homogeneous Soil Mapping Units 1km2
but the area is
quite varying) used for corn cultivation:
corn corresponds to 4.6% of UAA;
HSMU for which at least 10% of the agricultural land was used for
corn were selected.
Nathalie Villa-Vialaneix | Comparison of metamodels 8/37
Data [Villa-Vialaneix et al., 2012]
Data extracted from the biogeochemical simulator DNDC-EUROPE:
11 input (explanatory) variables (selected by experts and previous
simulations)
N FR (N input through fertilization; kg/ha y);
N MR (N input through manure spreading; kg/ha y);
Nfix (N input from biological fixation; kg/ha y);
Nres (N input from root residue; kg/ha y);
BD (Bulk Density; g/cm3
);
SOC (Soil organic carbon in topsoil; mass fraction);
PH (Soil pH);
Clay (Ratio of soil clay content);
Rain (Annual precipitation; mm/y);
Tmean (Annual mean temperature; C);
Nr (Concentration of N in rain; ppm).
Nathalie Villa-Vialaneix | Comparison of metamodels 8/37
Data [Villa-Vialaneix et al., 2012]
Data extracted from the biogeochemical simulator DNDC-EUROPE:
2 outputs to be estimated (independently) from the inputs:
N2O fluxes (greenhouse gaz);
N leaching (one major cause for water pollution).
Nathalie Villa-Vialaneix | Comparison of metamodels 8/37
Sommaire
1 DNDC-Europe model description
2 Methodology
Underfitting / Overfitting
Consistency
Problem at stake
3 Presentation of the different methods
4 Results
Nathalie Villa-Vialaneix | Comparison of metamodels 9/37
Regression
Consider the problem where:
Y ∈ R has to be estimated from X ∈ Rd
;
we are given a learning set, i.e., n i.i.d. observations of (X, Y),
(x1, y1), . . . , (xn, yn).
Example: Predict N2O fluxes from pH, climate, concentration of N in rain,
fertilization for a large number of HSMU . . .
Nathalie Villa-Vialaneix | Comparison of metamodels 10/37
Basics
From (xi, yi)i, definition of a machine, Φn
s.t.:
ˆynew = Φn
(xnew).
Nathalie Villa-Vialaneix | Comparison of metamodels 11/37
Basics
From (xi, yi)i, definition of a machine, Φn
s.t.:
ˆynew = Φn
(xnew).
if Y is numeric, Φn
is called a regression function;
if Y is a factor, Φn
is called a classifier;
Nathalie Villa-Vialaneix | Comparison of metamodels 11/37
Basics
From (xi, yi)i, definition of a machine, Φn
s.t.:
ˆynew = Φn
(xnew).
if Y is numeric, Φn
is called a regression function;
if Y is a factor, Φn
is called a classifier;
Φn
is said to be trained or learned from the observations (xi, yi)i.
Nathalie Villa-Vialaneix | Comparison of metamodels 11/37
Basics
From (xi, yi)i, definition of a machine, Φn
s.t.:
ˆynew = Φn
(xnew).
if Y is numeric, Φn
is called a regression function;
if Y is a factor, Φn
is called a classifier;
Φn
is said to be trained or learned from the observations (xi, yi)i.
Desirable properties
accuracy to the observations: predictions made on known data are
close to observed values;
Nathalie Villa-Vialaneix | Comparison of metamodels 11/37
Basics
From (xi, yi)i, definition of a machine, Φn
s.t.:
ˆynew = Φn
(xnew).
if Y is numeric, Φn
is called a regression function;
if Y is a factor, Φn
is called a classifier;
Φn
is said to be trained or learned from the observations (xi, yi)i.
Desirable properties
accuracy to the observations: predictions made on known data are
close to observed values;
generalization ability: predictions made on new data are also
accurate.
Nathalie Villa-Vialaneix | Comparison of metamodels 11/37
Basics
From (xi, yi)i, definition of a machine, Φn
s.t.:
ˆynew = Φn
(xnew).
if Y is numeric, Φn
is called a regression function;
if Y is a factor, Φn
is called a classifier;
Φn
is said to be trained or learned from the observations (xi, yi)i.
Desirable properties
accuracy to the observations: predictions made on known data are
close to observed values;
generalization ability: predictions made on new data are also
accurate.
Conflicting objectives!! [Vapnik, 1995]
Nathalie Villa-Vialaneix | Comparison of metamodels 11/37
Underfitting/Overfitting
Function x → y to be estimated
Nathalie Villa-Vialaneix | Comparison of metamodels 12/37
Underfitting/Overfitting
Observations we might have
Nathalie Villa-Vialaneix | Comparison of metamodels 12/37
Underfitting/Overfitting
Observations we do have
Nathalie Villa-Vialaneix | Comparison of metamodels 12/37
Underfitting/Overfitting
First estimation from the observations: underfitting
Nathalie Villa-Vialaneix | Comparison of metamodels 12/37
Underfitting/Overfitting
Second estimation from the observations: accurate estimation
Nathalie Villa-Vialaneix | Comparison of metamodels 12/37
Underfitting/Overfitting
Third estimation from the observations: overfitting
Nathalie Villa-Vialaneix | Comparison of metamodels 12/37
Underfitting/Overfitting
Summary
Nathalie Villa-Vialaneix | Comparison of metamodels 12/37
Errors
training error (measures the accuracy to the observations)
Nathalie Villa-Vialaneix | Comparison of metamodels 13/37
Errors
training error (measures the accuracy to the observations)
if y is a factor: misclassification rate
{ˆyi yi, i = 1, . . . , n}
n
Nathalie Villa-Vialaneix | Comparison of metamodels 13/37
Errors
training error (measures the accuracy to the observations)
if y is a factor: misclassification rate
{ˆyi yi, i = 1, . . . , n}
n
if y is numeric: mean square error (MSE)
1
n
n
i=1
(ˆyi − yi)2
Nathalie Villa-Vialaneix | Comparison of metamodels 13/37
Errors
training error (measures the accuracy to the observations)
if y is a factor: misclassification rate
{ˆyi yi, i = 1, . . . , n}
n
if y is numeric: mean square error (MSE)
1
n
n
i=1
(ˆyi − yi)2
or root mean square error (RMSE) or pseudo-R2
: 1−MSE/Var((yi)i)
Nathalie Villa-Vialaneix | Comparison of metamodels 13/37
Errors
training error (measures the accuracy to the observations)
if y is a factor: misclassification rate
{ˆyi yi, i = 1, . . . , n}
n
if y is numeric: mean square error (MSE)
1
n
n
i=1
(ˆyi − yi)2
or root mean square error (RMSE) or pseudo-R2
: 1−MSE/Var((yi)i)
test error: a way to prevent overfitting (estimates the generalization
error) is the simple validation
Nathalie Villa-Vialaneix | Comparison of metamodels 13/37
Errors
training error (measures the accuracy to the observations)
if y is a factor: misclassification rate
{ˆyi yi, i = 1, . . . , n}
n
if y is numeric: mean square error (MSE)
1
n
n
i=1
(ˆyi − yi)2
or root mean square error (RMSE) or pseudo-R2
: 1−MSE/Var((yi)i)
test error: a way to prevent overfitting (estimates the generalization
error) is the simple validation
1 split the data into training/test sets (usually 80%/20%)
2 train Φn
from the training dataset
3 calculate the test error from the remaining data
Nathalie Villa-Vialaneix | Comparison of metamodels 13/37
Example
Observations
Nathalie Villa-Vialaneix | Comparison of metamodels 14/37
Example
Training/Test datasets
Nathalie Villa-Vialaneix | Comparison of metamodels 14/37
Example
Training/Test errors
Nathalie Villa-Vialaneix | Comparison of metamodels 14/37
Example
Summary
Nathalie Villa-Vialaneix | Comparison of metamodels 14/37
Consistency in the parametric/non parametric case
Example in the parametric framework (linear methods)
an assumption is made on the form of the relation between X and Y:
Y = βT
X +
β is estimated from the observations (x1, y1), . . . , (xn, yn) by a given
method which calculates a βn
.
The estimation is said to be consistent if βn n→+∞
−−−−−−→ β under (eventually)
technical assumptions on X, , Y.
Nathalie Villa-Vialaneix | Comparison of metamodels 15/37
Consistency in the parametric/non parametric case
Example in the nonparametric framework
the form of the relation between X and Y is unknown:
Y = Φ(X) +
Φ is estimated from the observations (x1, y1), . . . , (xn, yn) by a given
method which calculates a Φn
.
The estimation is said to be consistent if Φn n→+∞
−−−−−−→ Φ under (eventually)
technical assumptions on X, , Y.
Nathalie Villa-Vialaneix | Comparison of metamodels 15/37
Consistency from the statistical learning perspective
[Vapnik, 1995]
Question: Are we really interested in estimating Φ or...
Nathalie Villa-Vialaneix | Comparison of metamodels 16/37
Consistency from the statistical learning perspective
[Vapnik, 1995]
Question: Are we really interested in estimating Φ or...
... rather in having the smallest prediction error?
Statistical learning perspective: a method that builds a machine Φn
from
the observations is said to be (universally) consistent if, given a risk
function R : R × R → R+ (which calculates an error),
E (R(Φn
(X), Y))
n→+∞
−−−−−−→ inf
Φ:X→R
E (R(Φ(X), Y)) ,
for any distribution of (X, Y) ∈ X × R.
Definitions: L∗ = infΦ:X→R E (R(Φ(X), Y)) and LΦ = E (R(Φ(X), Y)).
Nathalie Villa-Vialaneix | Comparison of metamodels 16/37
Purpose of the work
We focus on methods that are universally consistent. These methods lead
to the definition of machines Φn
such that:
ER(Φn
(X), Y)
N→+∞
−−−−−−→ L∗
= inf
Φ:Rd →R
LΦ
for any random pair (X, Y).
Nathalie Villa-Vialaneix | Comparison of metamodels 17/37
Purpose of the work
We focus on methods that are universally consistent. These methods lead
to the definition of machines Φn
such that:
ER(Φn
(X), Y)
N→+∞
−−−−−−→ L∗
= inf
Φ:Rd →R
LΦ
for any random pair (X, Y).
1 multi-layer perceptrons (neural networks): [Bishop, 1995]
2 Support Vector Machines (SVM): [Boser et al., 1992]
3 random forests: [Breiman, 2001] (universal consistency is not proven
in this case)
Nathalie Villa-Vialaneix | Comparison of metamodels 17/37
Methodology
Purpose: Comparison of several metamodeling approaches (accuracy,
computational time...).
Nathalie Villa-Vialaneix | Comparison of metamodels 18/37
Methodology
Purpose: Comparison of several metamodeling approaches (accuracy,
computational time...).
For every data set, every output and every method,
1 The data set was split into a training set and a test set (on a 80%/20%
basis);
2 The regression function was learned from the training set (with a full
validation process for the hyperparameter tuning);
Nathalie Villa-Vialaneix | Comparison of metamodels 18/37
Methodology
Purpose: Comparison of several metamodeling approaches (accuracy,
computational time...).
For every data set, every output and every method,
1 The data set was split into a training set and a test set (on a 80%/20%
basis);
2 The regression function was learned from the training set (with a full
validation process for the hyperparameter tuning);
3 The performances were calculated on the basis of the test set: for the
test set, predictions were made from the inputs and compared to the
true outputs.
Nathalie Villa-Vialaneix | Comparison of metamodels 18/37
Methods
2 linear models:
one with the 11 explanatory variables;
one with the 11 explanatory variables plus several nonlinear
transformations of these variables (square, log...): stepwise AIC was
used to train the model;
MLP
SVM
RF
3 approaches based on splines: ACOSSO (ANOVA splines), SDR
(improvement of the previous one) and DACE (kriging based
approach).
Nathalie Villa-Vialaneix | Comparison of metamodels 19/37
Sommaire
1 DNDC-Europe model description
2 Methodology
Underfitting / Overfitting
Consistency
Problem at stake
3 Presentation of the different methods
4 Results
Nathalie Villa-Vialaneix | Comparison of metamodels 20/37
Multilayer perceptrons (MLP)
A “one-hidden-layer perceptron” takes the form:
Φw : x ∈ Rd
→
Q
i=1
w
(2)
i
G xT
w
(1)
i
+ w
(0)
i
+ w
(2)
0
where:
the w are the weights of the MLP that have to be learned from the
learning set;
G is a given activation function: typically, G(z) = 1−e−z
1+e−z ;
Q is the number of neurons on the hidden layer. It controls the
flexibility of the MLP. Q is a hyper-parameter that is usually tuned
during the learning process.
Nathalie Villa-Vialaneix | Comparison of metamodels 21/37
Symbolic representation of MLPINPUTS
x1
x2
. . .
xd
w
(1)
11
w
(1)
pQ
Neuron 1
Neuron Q
φw(x)
w
(2)
1
w
(2)
Q
+w
(0)
Q
Nathalie Villa-Vialaneix | Comparison of metamodels 22/37
Learning MLP
Learning the weights: w are learned by a mean squared error
minimization scheme :
w∗
= arg min
w
N
i=1
L(yi, Φw(xi)).
Nathalie Villa-Vialaneix | Comparison of metamodels 23/37
Learning MLP
Learning the weights: w are learned by a mean squared error
minimization scheme penalized by a weight decay to avoid overfitting
(ensure a better generalization ability):
w∗
= arg min
w
N
i=1
L(yi, Φw(xi))+C w 2
.
Nathalie Villa-Vialaneix | Comparison of metamodels 23/37
Learning MLP
Learning the weights: w are learned by a mean squared error
minimization scheme penalized by a weight decay to avoid overfitting
(ensure a better generalization ability):
w∗
= arg min
w
N
i=1
L(yi, Φw(xi))+C w 2
.
Problem: MSE is not quadratic in w and thus some solutions can be
local minima.
Nathalie Villa-Vialaneix | Comparison of metamodels 23/37
Learning MLP
Learning the weights: w are learned by a mean squared error
minimization scheme penalized by a weight decay to avoid overfitting
(ensure a better generalization ability):
w∗
= arg min
w
N
i=1
L(yi, Φw(xi))+C w 2
.
Problem: MSE is not quadratic in w and thus some solutions can be
local minima.
Tuning the hyper-parameters, C and Q: simple validation was used to
tune first C and Q.
Nathalie Villa-Vialaneix | Comparison of metamodels 23/37
SVM
SVM is also an algorithm based on penalized error loss minimization:
1 Basic linear SVM for regression: Φ(w,b) is of the form x → wT
x + b
with (w, b) solution of
arg min
N
i=1
L (yi, Φ(w,b)(xi)) + λ w 2
where
λ is a regularization (hyper) parameter (to be tuned);
L (y, ˆy) = max{|y − ˆy| − , 0} is an -insensitive loss function
See -insensitive loss function
Nathalie Villa-Vialaneix | Comparison of metamodels 24/37
SVM
SVM is also an algorithm based on penalized error loss minimization:
1 Basic linear SVM for regression
2 Non linear SVM for regression are the same except that a non linear
(fixed) transformation of the inputs is previously made: ϕ(x) ∈ H is
used instead of x.
Original space X Feature space H
Ψ (non linear)
Nathalie Villa-Vialaneix | Comparison of metamodels 24/37
SVM
SVM is also an algorithm based on penalized error loss minimization:
1 Basic linear SVM for regression
2 Non linear SVM for regression are the same except that a non linear
(fixed) transformation of the inputs is previously made: ϕ(x) ∈ H is
used instead of x.
Kernel trick: in fact, ϕ is never explicit but used through a kernel,
K : Rd
× Rd
→ R. This kernel is used for K(xi, xj) = ϕ(xi), ϕ(xj) .
Original space X Feature space H
Ψ (non linear)
Nathalie Villa-Vialaneix | Comparison of metamodels 24/37
SVM
SVM is also an algorithm based on penalized error loss minimization:
1 Basic linear SVM for regression
2 Non linear SVM for regression are the same except that a non linear
(fixed) transformation of the inputs is previously made: ϕ(x) ∈ H is
used instead of x.
Kernel trick: in fact, ϕ is never explicit but used through a kernel,
K : Rd
× Rd
→ R. This kernel is used for K(xi, xj) = ϕ(xi), ϕ(xj) .
Common kernel: Gaussian kernel
Kγ(u, v) = e−γ u−v 2
is known to have good theoretical properties both for accuracy and
generalization.
Nathalie Villa-Vialaneix | Comparison of metamodels 24/37
Learning SVM
Learning (w, b): w = N
i=1 αiK(xi, .) and b are calculated by an exact
optimization scheme (quadratic programming). The only step that can
be time consumming is the calculation of the kernel matrix:
K(xi, xj) for i, j = 1, . . . , n.
Nathalie Villa-Vialaneix | Comparison of metamodels 25/37
Learning SVM
Learning (w, b): w = N
i=1 αiK(xi, .) and b are calculated by an exact
optimization scheme (quadratic programming). The only step that can
be time consumming is the calculation of the kernel matrix:
K(xi, xj) for i, j = 1, . . . , n.
The resulting Φn
is known to be of the form:
Φn
(x) =
N
i=1
αiK(xi, x) + b
where only a few αi are non zero. The corresponding xi are called
support vectors.
Nathalie Villa-Vialaneix | Comparison of metamodels 25/37
Learning SVM
Learning (w, b): w = N
i=1 αiK(xi, .) and b are calculated by an exact
optimization scheme (quadratic programming). The only step that can
be time consumming is the calculation of the kernel matrix:
K(xi, xj) for i, j = 1, . . . , n.
The resulting Φn
is known to be of the form:
Φn
(x) =
N
i=1
αiK(xi, x) + b
where only a few αi are non zero. The corresponding xi are called
support vectors.
Tuning of the hyper-parameters, C = 1/λ, and γ: simple validation
has been used. To limit waste of time, has not been tuned in our
experiments but set to the default value (1) which ensured 0.5n
support vectors at most.
Nathalie Villa-Vialaneix | Comparison of metamodels 25/37
From regression tree to random forest
Example of a regression tree
|
SOCt < 0.095
PH < 7.815
SOCt < 0.025
FR < 130.45 clay < 0.185
SOCt < 0.025
SOCt < 0.145
FR < 108.45
PH < 6.5
4.366 7.100
15.010 8.975
2.685 5.257
26.260
28.070 35.900 59.330
Each split is made such that
the two induced subsets have
the greatest homogeneity pos-
sible.
The prediction of a final node is
the mean of the Y value of the
observations belonging to this
node.
Nathalie Villa-Vialaneix | Comparison of metamodels 26/37
Random forest
Basic principle: combination of a large number of under-efficient
regression trees (the prediction is the mean prediction of all trees).
Nathalie Villa-Vialaneix | Comparison of metamodels 27/37
Random forest
Basic principle: combination of a large number of under-efficient
regression trees (the prediction is the mean prediction of all trees).
For each tree, two simplifications of the original method are performed:
1 A given number of observations are randomly chosen among the
training set: this subset of the training data set is called in-bag sample
whereas the other observations are called out-of-bag and are used to
control the error of the tree;
2 For each node of the tree, a given number of variables are randomly
chosen among all possible explanatory variables.
The best split is then calculated on the basis of these variables and of the
chosen observations. The chosen observations are the same for a given
tree whereas the variables taken into account change for each split.
Nathalie Villa-Vialaneix | Comparison of metamodels 27/37
Additional tools
OOB (Out-Of Bags) error: error based on the OOB predictions.
Stabilization of OOB error is a good indication that there is enough
trees in the forest.
Nathalie Villa-Vialaneix | Comparison of metamodels 28/37
Additional tools
OOB (Out-Of Bags) error: error based on the OOB predictions.
Stabilization of OOB error is a good indication that there is enough
trees in the forest.
Importance of a variable to help interpretation: for a given variable Xj
(j ∈ {1, . . . , p}), the importance of Xj
is the mean decrease in accuracy
obtained when the values of Xj
are randomized:
I(Xj
) = E R(Φn
(X(j)
), Y) − E (R(Φn
(X), Y))
in which X(j) = (X1
, . . . , X(j), . . . , Xp
), X(j) being the variable Xj
with
permuted values.
Nathalie Villa-Vialaneix | Comparison of metamodels 28/37
Additional tools
OOB (Out-Of Bags) error: error based on the OOB predictions.
Stabilization of OOB error is a good indication that there is enough
trees in the forest.
Importance of a variable to help interpretation: for a given variable Xj
(j ∈ {1, . . . , p}), the importance of Xj
is the mean decrease in accuracy
obtained when the values of Xj
are randomized. Importance is
estimated with OOB observations (see next slides for details)
Nathalie Villa-Vialaneix | Comparison of metamodels 28/37
Learning a random forest
Random forest are not very sensitive to hyper-parameters (number of
observations for each tree, number of variables for each split): the default
values have been used.
Nathalie Villa-Vialaneix | Comparison of metamodels 29/37
Learning a random forest
Random forest are not very sensitive to hyper-parameters (number of
observations for each tree, number of variables for each split): the default
values have been used.
The number of trees should be large enough for the mean squared error
based on out-of-sample observations to stabilize:
0 100 200 300 400 500
0246810
trees
Error
Out−of−bag (training)
Test
Nathalie Villa-Vialaneix | Comparison of metamodels 29/37
Importance estimation in random forests
OOB estimation for variable Xj
1: for b = 1 → B (loop on trees) do
2: permute values for (x
j
i
)i: xi T b return x
(j,b)
i
= (x1
i
, . . . , x
(j,b)
i
, . . . , x
p
i
),
x
(j,b)
i
permuted values
3: predict Φb
x
(j,b)
i
4: end for
5: return OOB estimation of the importance of Xj
1
B
B
b=1


1
|T b| xi T b
Φb
(x
(j,b)
i
) − yi
2
−
1
|T b| xi T b
Φb
(xi) − yi
2


Nathalie Villa-Vialaneix | Comparison of metamodels 30/37
Sommaire
1 DNDC-Europe model description
2 Methodology
Underfitting / Overfitting
Consistency
Problem at stake
3 Presentation of the different methods
4 Results
Nathalie Villa-Vialaneix | Comparison of metamodels 31/37
Influence of the training sample size
5 6 7 8 9
0.50.60.70.80.91.0
N2O prediction
log size (training)
R2
LM1
LM2
Dace
SDR
ACOSSO
MLP
SVM
RF
Nathalie Villa-Vialaneix | Comparison of metamodels 32/37
Influence of the training sample size
5 6 7 8 9
0.60.70.80.91.0
N leaching prediction
log size (training)
R2
LM1
LM2
Dace
SDR
ACOSSO
MLP
SVM
RF
Nathalie Villa-Vialaneix | Comparison of metamodels 32/37
Computational time
Use LM1 LM2 Dace SDR Acosso
Train <1 s. 50 min 80 min 4 hours 65 min n
Prediction <1 s. <1 s. 90 s. 14 min 4 min.
Use MLP SVM RF
Train 2.5 hours 5 hours 15 min
Prediction 1 s. 20 s. 5 s.
Time for DNDC: about 200 hours with a desktop computer and about 2
days using cluster 7!
Nathalie Villa-Vialaneix | Comparison of metamodels 33/37
Further comparisons
Evaluation of the different step (time/difficulty)
Training Validation Test
LM1 ++ +
LM2 + +
ACOSSO = + -
SDR = + -
DACE = - -
MLP - - +
SVM = - -
RF + + +
Nathalie Villa-Vialaneix | Comparison of metamodels 34/37
Understanding which inputs are important
Example (N2O, RF):
q
q
q
q
q
q
q
q
q
q
q
2 4 6 8 10
51015202530
Rank
Importance(meandecreaseMSE)
pH
Nr N_MR
Nfix
N_FR
clay NresTmean BD rain
SOC
The variables SOC and PH are the most important for accurate
predictions.
Nathalie Villa-Vialaneix | Comparison of metamodels 35/37
Understanding which inputs are important
Example (N leaching, SVM):
q
q
q q
q
q
q
q
q q
q
2 4 6 8 10
050010001500
Rank
Importance(decreaseMSE)
N_FR
Nres pH
Nr
clay
rain
SOC
Tmean Nfix
BD
N_MR
The variables N_MR, N_FR, Nres and pH are the most important for
accurate predictions.
Nathalie Villa-Vialaneix | Comparison of metamodels 35/37
Thank you for your attention...
... questions?
Nathalie Villa-Vialaneix | Comparison of metamodels 36/37
Bishop, C. (1995).
Neural Networks for Pattern Recognition.
Oxford University Press, New York, USA.
Boser, B., Guyon, I., and Vapnik, V. (1992).
A training algorithm for optimal margin classifiers.
In 5th
annual ACM Workshop on COLT, pages 144–152. D. Haussler Editor, ACM
Press.
Breiman, L. (2001).
Random forests.
Machine Learning, 45(1):5–32.
Vapnik, V. (1995).
The Nature of Statistical Learning Theory.
Springer Verlag, New York, USA.
Villa-Vialaneix, N., Follador, M., Ratto, M., and Leip, A. (2012).
A comparison of eight metamodeling techniques for the simulation of N2O fluxes and
N leaching from corn crops.
Environmental Modelling and Software, 34:51–66.
Nathalie Villa-Vialaneix | Comparison of metamodels 37/37
-insensitive loss function
Go back
Nathalie Villa-Vialaneix | Comparison of metamodels 37/37

More Related Content

What's hot

Lesson 4: Calculating Limits (Section 21 slides)
Lesson 4: Calculating Limits (Section 21 slides)Lesson 4: Calculating Limits (Section 21 slides)
Lesson 4: Calculating Limits (Section 21 slides)Matthew Leingang
 
Machine Learning for Actuaries
Machine Learning for ActuariesMachine Learning for Actuaries
Machine Learning for ActuariesArthur Charpentier
 
Slides econometrics-2018-graduate-1
Slides econometrics-2018-graduate-1Slides econometrics-2018-graduate-1
Slides econometrics-2018-graduate-1Arthur Charpentier
 
Learning from (dis)similarity data
Learning from (dis)similarity dataLearning from (dis)similarity data
Learning from (dis)similarity datatuxette
 
Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4Arthur Charpentier
 
Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forestsChristian Robert
 
(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?Christian Robert
 
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013Christian Robert
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Frank Nielsen
 
Nonlinear Manifolds in Computer Vision
Nonlinear Manifolds in Computer VisionNonlinear Manifolds in Computer Vision
Nonlinear Manifolds in Computer Visionzukun
 
Quadratics in vertex form
Quadratics in vertex formQuadratics in vertex form
Quadratics in vertex formDouglas Agyei
 
When is undersampling effective in unbalanced classification tasks?
When is undersampling effective in unbalanced classification tasks?When is undersampling effective in unbalanced classification tasks?
When is undersampling effective in unbalanced classification tasks?Andrea Dal Pozzolo
 
t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750richardchandler
 
01 graphical models
01 graphical models01 graphical models
01 graphical modelszukun
 

What's hot (20)

Lausanne 2019 #1
Lausanne 2019 #1Lausanne 2019 #1
Lausanne 2019 #1
 
Varese italie seminar
Varese italie seminarVarese italie seminar
Varese italie seminar
 
Side 2019 #9
Side 2019 #9Side 2019 #9
Side 2019 #9
 
Lesson 4: Calculating Limits (Section 21 slides)
Lesson 4: Calculating Limits (Section 21 slides)Lesson 4: Calculating Limits (Section 21 slides)
Lesson 4: Calculating Limits (Section 21 slides)
 
Machine Learning for Actuaries
Machine Learning for ActuariesMachine Learning for Actuaries
Machine Learning for Actuaries
 
Slides econometrics-2018-graduate-1
Slides econometrics-2018-graduate-1Slides econometrics-2018-graduate-1
Slides econometrics-2018-graduate-1
 
Learning from (dis)similarity data
Learning from (dis)similarity dataLearning from (dis)similarity data
Learning from (dis)similarity data
 
Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4Slides econometrics-2018-graduate-4
Slides econometrics-2018-graduate-4
 
Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forests
 
QMC: Transition Workshop - Approximating Multivariate Functions When Function...
QMC: Transition Workshop - Approximating Multivariate Functions When Function...QMC: Transition Workshop - Approximating Multivariate Functions When Function...
QMC: Transition Workshop - Approximating Multivariate Functions When Function...
 
(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?(Approximate) Bayesian computation as a new empirical Bayes (something)?
(Approximate) Bayesian computation as a new empirical Bayes (something)?
 
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
Discussion of ABC talk by Francesco Pauli, Padova, March 21, 2013
 
Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...Pattern learning and recognition on statistical manifolds: An information-geo...
Pattern learning and recognition on statistical manifolds: An information-geo...
 
Nonlinear Manifolds in Computer Vision
Nonlinear Manifolds in Computer VisionNonlinear Manifolds in Computer Vision
Nonlinear Manifolds in Computer Vision
 
MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty fo...
MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty fo...MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty fo...
MUMS: Transition & SPUQ Workshop - Some Strategies to Quantify Uncertainty fo...
 
Quadratics in vertex form
Quadratics in vertex formQuadratics in vertex form
Quadratics in vertex form
 
When is undersampling effective in unbalanced classification tasks?
When is undersampling effective in unbalanced classification tasks?When is undersampling effective in unbalanced classification tasks?
When is undersampling effective in unbalanced classification tasks?
 
t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750
 
01 graphical models
01 graphical models01 graphical models
01 graphical models
 
Linear regression
Linear regressionLinear regression
Linear regression
 

Similar to A comparison of learning methods to predict N2O fluxes and N leaching

A short introduction to statistical learning
A short introduction to statistical learningA short introduction to statistical learning
A short introduction to statistical learningtuxette
 
A comparison of learning methods to predict N2O fluxes and N leaching
A comparison of learning methods to predict N2O fluxes and N leachingA comparison of learning methods to predict N2O fluxes and N leaching
A comparison of learning methods to predict N2O fluxes and N leachingtuxette
 
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Daniel Valcarce
 
An introduction to neural networks
An introduction to neural networksAn introduction to neural networks
An introduction to neural networkstuxette
 
An introduction to neural network
An introduction to neural networkAn introduction to neural network
An introduction to neural networktuxette
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIRtuxette
 
Adaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud DetectionAdaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud DetectionAndrea Dal Pozzolo
 
FA Falls 2 Teresa O'Callaghan
FA Falls 2 Teresa O'CallaghanFA Falls 2 Teresa O'Callaghan
FA Falls 2 Teresa O'Callaghananne spencer
 
On-line, non-clairvoyant optimization of workflow activity granularity task o...
On-line, non-clairvoyant optimization of workflow activity granularity task o...On-line, non-clairvoyant optimization of workflow activity granularity task o...
On-line, non-clairvoyant optimization of workflow activity granularity task o...Rafael Ferreira da Silva
 
Cheatsheet machine-learning-tips-and-tricks
Cheatsheet machine-learning-tips-and-tricksCheatsheet machine-learning-tips-and-tricks
Cheatsheet machine-learning-tips-and-tricksSteve Nouri
 
Goodness–of–fit tests for regression models: the functional data case
Goodness–of–fit tests for regression models: the functional data caseGoodness–of–fit tests for regression models: the functional data case
Goodness–of–fit tests for regression models: the functional data caseNeuroMat
 
An introduction to digital health surveillance from online user-generated con...
An introduction to digital health surveillance from online user-generated con...An introduction to digital health surveillance from online user-generated con...
An introduction to digital health surveillance from online user-generated con...Vasileios Lampos
 
Prediction model of algal blooms using logistic regression and confusion matrix
Prediction model of algal blooms using logistic regression and confusion matrix Prediction model of algal blooms using logistic regression and confusion matrix
Prediction model of algal blooms using logistic regression and confusion matrix IJECEIAES
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systemsrecsysfr
 
Knowledge-based generalization for metabolic models
Knowledge-based generalization for metabolic modelsKnowledge-based generalization for metabolic models
Knowledge-based generalization for metabolic modelsAnna Zhukova
 
Workflow fairness control on online and non-clairvoyant distributed computing...
Workflow fairness control on online and non-clairvoyant distributed computing...Workflow fairness control on online and non-clairvoyant distributed computing...
Workflow fairness control on online and non-clairvoyant distributed computing...Rafael Ferreira da Silva
 

Similar to A comparison of learning methods to predict N2O fluxes and N leaching (20)

A short introduction to statistical learning
A short introduction to statistical learningA short introduction to statistical learning
A short introduction to statistical learning
 
A comparison of learning methods to predict N2O fluxes and N leaching
A comparison of learning methods to predict N2O fluxes and N leachingA comparison of learning methods to predict N2O fluxes and N leaching
A comparison of learning methods to predict N2O fluxes and N leaching
 
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
 
An introduction to neural networks
An introduction to neural networksAn introduction to neural networks
An introduction to neural networks
 
An introduction to neural network
An introduction to neural networkAn introduction to neural network
An introduction to neural network
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIR
 
Adaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud DetectionAdaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud Detection
 
FA Falls 2 Teresa O'Callaghan
FA Falls 2 Teresa O'CallaghanFA Falls 2 Teresa O'Callaghan
FA Falls 2 Teresa O'Callaghan
 
Nonnegative Matrix Factorization with Side Information for Time Series Recove...
Nonnegative Matrix Factorization with Side Information for Time Series Recove...Nonnegative Matrix Factorization with Side Information for Time Series Recove...
Nonnegative Matrix Factorization with Side Information for Time Series Recove...
 
On-line, non-clairvoyant optimization of workflow activity granularity task o...
On-line, non-clairvoyant optimization of workflow activity granularity task o...On-line, non-clairvoyant optimization of workflow activity granularity task o...
On-line, non-clairvoyant optimization of workflow activity granularity task o...
 
Cheatsheet machine-learning-tips-and-tricks
Cheatsheet machine-learning-tips-and-tricksCheatsheet machine-learning-tips-and-tricks
Cheatsheet machine-learning-tips-and-tricks
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Inference on Treatment...
 
Goodness–of–fit tests for regression models: the functional data case
Goodness–of–fit tests for regression models: the functional data caseGoodness–of–fit tests for regression models: the functional data case
Goodness–of–fit tests for regression models: the functional data case
 
Georgia Tech 2017 March Talk
Georgia Tech 2017 March TalkGeorgia Tech 2017 March Talk
Georgia Tech 2017 March Talk
 
An introduction to digital health surveillance from online user-generated con...
An introduction to digital health surveillance from online user-generated con...An introduction to digital health surveillance from online user-generated con...
An introduction to digital health surveillance from online user-generated con...
 
Prediction model of algal blooms using logistic regression and confusion matrix
Prediction model of algal blooms using logistic regression and confusion matrix Prediction model of algal blooms using logistic regression and confusion matrix
Prediction model of algal blooms using logistic regression and confusion matrix
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systems
 
Business statistics-ii-aarhus-bss
Business statistics-ii-aarhus-bssBusiness statistics-ii-aarhus-bss
Business statistics-ii-aarhus-bss
 
Knowledge-based generalization for metabolic models
Knowledge-based generalization for metabolic modelsKnowledge-based generalization for metabolic models
Knowledge-based generalization for metabolic models
 
Workflow fairness control on online and non-clairvoyant distributed computing...
Workflow fairness control on online and non-clairvoyant distributed computing...Workflow fairness control on online and non-clairvoyant distributed computing...
Workflow fairness control on online and non-clairvoyant distributed computing...
 

More from tuxette

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathstuxette
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènestuxette
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquestuxette
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-Ctuxette
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?tuxette
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...tuxette
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquestuxette
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeantuxette
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...tuxette
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquestuxette
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...tuxette
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...tuxette
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation datatuxette
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?tuxette
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysistuxette
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricestuxette
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Predictiontuxette
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelstuxette
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random foresttuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICStuxette
 

More from tuxette (20)

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènes
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiques
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiques
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random forest
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 

Recently uploaded

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 

Recently uploaded (20)

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 

A comparison of learning methods to predict N2O fluxes and N leaching

  • 1. A comparison of learning methods to predict N2O fluxes and N leaching Nathalie Villa-Vialaneix nathalie.villa@toulouse.inra.fr http://www.nathalievilla.org Workshop on N2O meta-modelling, March 9th, 2015 INRA, Toulouse Nathalie Villa-Vialaneix | Comparison of metamodels 1/37
  • 2. Sommaire 1 DNDC-Europe model description 2 Methodology Underfitting / Overfitting Consistency Problem at stake 3 Presentation of the different methods 4 Results Nathalie Villa-Vialaneix | Comparison of metamodels 2/37
  • 3. Sommaire 1 DNDC-Europe model description 2 Methodology Underfitting / Overfitting Consistency Problem at stake 3 Presentation of the different methods 4 Results Nathalie Villa-Vialaneix | Comparison of metamodels 3/37
  • 4. General overview Modern issues in agriculture fight against the food crisis; while preserving environments. Nathalie Villa-Vialaneix | Comparison of metamodels 4/37
  • 5. General overview Modern issues in agriculture fight against the food crisis; while preserving environments. EC needs simulation tools to link the direct aids with the respect of standards ensuring proper management; quantify the environmental impact of European policies (“Cross Compliance”). Nathalie Villa-Vialaneix | Comparison of metamodels 4/37
  • 6. Cross Compliance Assessment Tool DNDC is a biogeochemical model. Nathalie Villa-Vialaneix | Comparison of metamodels 5/37
  • 7. Zoom on DNDC-EUROPE Nathalie Villa-Vialaneix | Comparison of metamodels 6/37
  • 8. Moving from DNDC-Europe to metamodeling Needs for metamodeling easier integration into CCAT faster execution and responding scenario analysis Nathalie Villa-Vialaneix | Comparison of metamodels 7/37
  • 9. Moving from DNDC-Europe to metamodeling Needs for metamodeling easier integration into CCAT faster execution and responding scenario analysis Nathalie Villa-Vialaneix | Comparison of metamodels 7/37
  • 10. Data [Villa-Vialaneix et al., 2012] Data extracted from the biogeochemical simulator DNDC-EUROPE: ∼ 19 000 HSMU (Homogeneous Soil Mapping Units 1km2 but the area is quite varying) used for corn cultivation: corn corresponds to 4.6% of UAA; HSMU for which at least 10% of the agricultural land was used for corn were selected. Nathalie Villa-Vialaneix | Comparison of metamodels 8/37
  • 11. Data [Villa-Vialaneix et al., 2012] Data extracted from the biogeochemical simulator DNDC-EUROPE: 11 input (explanatory) variables (selected by experts and previous simulations) N FR (N input through fertilization; kg/ha y); N MR (N input through manure spreading; kg/ha y); Nfix (N input from biological fixation; kg/ha y); Nres (N input from root residue; kg/ha y); BD (Bulk Density; g/cm3 ); SOC (Soil organic carbon in topsoil; mass fraction); PH (Soil pH); Clay (Ratio of soil clay content); Rain (Annual precipitation; mm/y); Tmean (Annual mean temperature; C); Nr (Concentration of N in rain; ppm). Nathalie Villa-Vialaneix | Comparison of metamodels 8/37
  • 12. Data [Villa-Vialaneix et al., 2012] Data extracted from the biogeochemical simulator DNDC-EUROPE: 2 outputs to be estimated (independently) from the inputs: N2O fluxes (greenhouse gaz); N leaching (one major cause for water pollution). Nathalie Villa-Vialaneix | Comparison of metamodels 8/37
  • 13. Sommaire 1 DNDC-Europe model description 2 Methodology Underfitting / Overfitting Consistency Problem at stake 3 Presentation of the different methods 4 Results Nathalie Villa-Vialaneix | Comparison of metamodels 9/37
  • 14. Regression Consider the problem where: Y ∈ R has to be estimated from X ∈ Rd ; we are given a learning set, i.e., n i.i.d. observations of (X, Y), (x1, y1), . . . , (xn, yn). Example: Predict N2O fluxes from pH, climate, concentration of N in rain, fertilization for a large number of HSMU . . . Nathalie Villa-Vialaneix | Comparison of metamodels 10/37
  • 15. Basics From (xi, yi)i, definition of a machine, Φn s.t.: ˆynew = Φn (xnew). Nathalie Villa-Vialaneix | Comparison of metamodels 11/37
  • 16. Basics From (xi, yi)i, definition of a machine, Φn s.t.: ˆynew = Φn (xnew). if Y is numeric, Φn is called a regression function; if Y is a factor, Φn is called a classifier; Nathalie Villa-Vialaneix | Comparison of metamodels 11/37
  • 17. Basics From (xi, yi)i, definition of a machine, Φn s.t.: ˆynew = Φn (xnew). if Y is numeric, Φn is called a regression function; if Y is a factor, Φn is called a classifier; Φn is said to be trained or learned from the observations (xi, yi)i. Nathalie Villa-Vialaneix | Comparison of metamodels 11/37
  • 18. Basics From (xi, yi)i, definition of a machine, Φn s.t.: ˆynew = Φn (xnew). if Y is numeric, Φn is called a regression function; if Y is a factor, Φn is called a classifier; Φn is said to be trained or learned from the observations (xi, yi)i. Desirable properties accuracy to the observations: predictions made on known data are close to observed values; Nathalie Villa-Vialaneix | Comparison of metamodels 11/37
  • 19. Basics From (xi, yi)i, definition of a machine, Φn s.t.: ˆynew = Φn (xnew). if Y is numeric, Φn is called a regression function; if Y is a factor, Φn is called a classifier; Φn is said to be trained or learned from the observations (xi, yi)i. Desirable properties accuracy to the observations: predictions made on known data are close to observed values; generalization ability: predictions made on new data are also accurate. Nathalie Villa-Vialaneix | Comparison of metamodels 11/37
  • 20. Basics From (xi, yi)i, definition of a machine, Φn s.t.: ˆynew = Φn (xnew). if Y is numeric, Φn is called a regression function; if Y is a factor, Φn is called a classifier; Φn is said to be trained or learned from the observations (xi, yi)i. Desirable properties accuracy to the observations: predictions made on known data are close to observed values; generalization ability: predictions made on new data are also accurate. Conflicting objectives!! [Vapnik, 1995] Nathalie Villa-Vialaneix | Comparison of metamodels 11/37
  • 21. Underfitting/Overfitting Function x → y to be estimated Nathalie Villa-Vialaneix | Comparison of metamodels 12/37
  • 22. Underfitting/Overfitting Observations we might have Nathalie Villa-Vialaneix | Comparison of metamodels 12/37
  • 23. Underfitting/Overfitting Observations we do have Nathalie Villa-Vialaneix | Comparison of metamodels 12/37
  • 24. Underfitting/Overfitting First estimation from the observations: underfitting Nathalie Villa-Vialaneix | Comparison of metamodels 12/37
  • 25. Underfitting/Overfitting Second estimation from the observations: accurate estimation Nathalie Villa-Vialaneix | Comparison of metamodels 12/37
  • 26. Underfitting/Overfitting Third estimation from the observations: overfitting Nathalie Villa-Vialaneix | Comparison of metamodels 12/37
  • 28. Errors training error (measures the accuracy to the observations) Nathalie Villa-Vialaneix | Comparison of metamodels 13/37
  • 29. Errors training error (measures the accuracy to the observations) if y is a factor: misclassification rate {ˆyi yi, i = 1, . . . , n} n Nathalie Villa-Vialaneix | Comparison of metamodels 13/37
  • 30. Errors training error (measures the accuracy to the observations) if y is a factor: misclassification rate {ˆyi yi, i = 1, . . . , n} n if y is numeric: mean square error (MSE) 1 n n i=1 (ˆyi − yi)2 Nathalie Villa-Vialaneix | Comparison of metamodels 13/37
  • 31. Errors training error (measures the accuracy to the observations) if y is a factor: misclassification rate {ˆyi yi, i = 1, . . . , n} n if y is numeric: mean square error (MSE) 1 n n i=1 (ˆyi − yi)2 or root mean square error (RMSE) or pseudo-R2 : 1−MSE/Var((yi)i) Nathalie Villa-Vialaneix | Comparison of metamodels 13/37
  • 32. Errors training error (measures the accuracy to the observations) if y is a factor: misclassification rate {ˆyi yi, i = 1, . . . , n} n if y is numeric: mean square error (MSE) 1 n n i=1 (ˆyi − yi)2 or root mean square error (RMSE) or pseudo-R2 : 1−MSE/Var((yi)i) test error: a way to prevent overfitting (estimates the generalization error) is the simple validation Nathalie Villa-Vialaneix | Comparison of metamodels 13/37
  • 33. Errors training error (measures the accuracy to the observations) if y is a factor: misclassification rate {ˆyi yi, i = 1, . . . , n} n if y is numeric: mean square error (MSE) 1 n n i=1 (ˆyi − yi)2 or root mean square error (RMSE) or pseudo-R2 : 1−MSE/Var((yi)i) test error: a way to prevent overfitting (estimates the generalization error) is the simple validation 1 split the data into training/test sets (usually 80%/20%) 2 train Φn from the training dataset 3 calculate the test error from the remaining data Nathalie Villa-Vialaneix | Comparison of metamodels 13/37
  • 34. Example Observations Nathalie Villa-Vialaneix | Comparison of metamodels 14/37
  • 36. Example Training/Test errors Nathalie Villa-Vialaneix | Comparison of metamodels 14/37
  • 37. Example Summary Nathalie Villa-Vialaneix | Comparison of metamodels 14/37
  • 38. Consistency in the parametric/non parametric case Example in the parametric framework (linear methods) an assumption is made on the form of the relation between X and Y: Y = βT X + β is estimated from the observations (x1, y1), . . . , (xn, yn) by a given method which calculates a βn . The estimation is said to be consistent if βn n→+∞ −−−−−−→ β under (eventually) technical assumptions on X, , Y. Nathalie Villa-Vialaneix | Comparison of metamodels 15/37
  • 39. Consistency in the parametric/non parametric case Example in the nonparametric framework the form of the relation between X and Y is unknown: Y = Φ(X) + Φ is estimated from the observations (x1, y1), . . . , (xn, yn) by a given method which calculates a Φn . The estimation is said to be consistent if Φn n→+∞ −−−−−−→ Φ under (eventually) technical assumptions on X, , Y. Nathalie Villa-Vialaneix | Comparison of metamodels 15/37
  • 40. Consistency from the statistical learning perspective [Vapnik, 1995] Question: Are we really interested in estimating Φ or... Nathalie Villa-Vialaneix | Comparison of metamodels 16/37
  • 41. Consistency from the statistical learning perspective [Vapnik, 1995] Question: Are we really interested in estimating Φ or... ... rather in having the smallest prediction error? Statistical learning perspective: a method that builds a machine Φn from the observations is said to be (universally) consistent if, given a risk function R : R × R → R+ (which calculates an error), E (R(Φn (X), Y)) n→+∞ −−−−−−→ inf Φ:X→R E (R(Φ(X), Y)) , for any distribution of (X, Y) ∈ X × R. Definitions: L∗ = infΦ:X→R E (R(Φ(X), Y)) and LΦ = E (R(Φ(X), Y)). Nathalie Villa-Vialaneix | Comparison of metamodels 16/37
  • 42. Purpose of the work We focus on methods that are universally consistent. These methods lead to the definition of machines Φn such that: ER(Φn (X), Y) N→+∞ −−−−−−→ L∗ = inf Φ:Rd →R LΦ for any random pair (X, Y). Nathalie Villa-Vialaneix | Comparison of metamodels 17/37
  • 43. Purpose of the work We focus on methods that are universally consistent. These methods lead to the definition of machines Φn such that: ER(Φn (X), Y) N→+∞ −−−−−−→ L∗ = inf Φ:Rd →R LΦ for any random pair (X, Y). 1 multi-layer perceptrons (neural networks): [Bishop, 1995] 2 Support Vector Machines (SVM): [Boser et al., 1992] 3 random forests: [Breiman, 2001] (universal consistency is not proven in this case) Nathalie Villa-Vialaneix | Comparison of metamodels 17/37
  • 44. Methodology Purpose: Comparison of several metamodeling approaches (accuracy, computational time...). Nathalie Villa-Vialaneix | Comparison of metamodels 18/37
  • 45. Methodology Purpose: Comparison of several metamodeling approaches (accuracy, computational time...). For every data set, every output and every method, 1 The data set was split into a training set and a test set (on a 80%/20% basis); 2 The regression function was learned from the training set (with a full validation process for the hyperparameter tuning); Nathalie Villa-Vialaneix | Comparison of metamodels 18/37
  • 46. Methodology Purpose: Comparison of several metamodeling approaches (accuracy, computational time...). For every data set, every output and every method, 1 The data set was split into a training set and a test set (on a 80%/20% basis); 2 The regression function was learned from the training set (with a full validation process for the hyperparameter tuning); 3 The performances were calculated on the basis of the test set: for the test set, predictions were made from the inputs and compared to the true outputs. Nathalie Villa-Vialaneix | Comparison of metamodels 18/37
  • 47. Methods 2 linear models: one with the 11 explanatory variables; one with the 11 explanatory variables plus several nonlinear transformations of these variables (square, log...): stepwise AIC was used to train the model; MLP SVM RF 3 approaches based on splines: ACOSSO (ANOVA splines), SDR (improvement of the previous one) and DACE (kriging based approach). Nathalie Villa-Vialaneix | Comparison of metamodels 19/37
  • 48. Sommaire 1 DNDC-Europe model description 2 Methodology Underfitting / Overfitting Consistency Problem at stake 3 Presentation of the different methods 4 Results Nathalie Villa-Vialaneix | Comparison of metamodels 20/37
  • 49. Multilayer perceptrons (MLP) A “one-hidden-layer perceptron” takes the form: Φw : x ∈ Rd → Q i=1 w (2) i G xT w (1) i + w (0) i + w (2) 0 where: the w are the weights of the MLP that have to be learned from the learning set; G is a given activation function: typically, G(z) = 1−e−z 1+e−z ; Q is the number of neurons on the hidden layer. It controls the flexibility of the MLP. Q is a hyper-parameter that is usually tuned during the learning process. Nathalie Villa-Vialaneix | Comparison of metamodels 21/37
  • 50. Symbolic representation of MLPINPUTS x1 x2 . . . xd w (1) 11 w (1) pQ Neuron 1 Neuron Q φw(x) w (2) 1 w (2) Q +w (0) Q Nathalie Villa-Vialaneix | Comparison of metamodels 22/37
  • 51. Learning MLP Learning the weights: w are learned by a mean squared error minimization scheme : w∗ = arg min w N i=1 L(yi, Φw(xi)). Nathalie Villa-Vialaneix | Comparison of metamodels 23/37
  • 52. Learning MLP Learning the weights: w are learned by a mean squared error minimization scheme penalized by a weight decay to avoid overfitting (ensure a better generalization ability): w∗ = arg min w N i=1 L(yi, Φw(xi))+C w 2 . Nathalie Villa-Vialaneix | Comparison of metamodels 23/37
  • 53. Learning MLP Learning the weights: w are learned by a mean squared error minimization scheme penalized by a weight decay to avoid overfitting (ensure a better generalization ability): w∗ = arg min w N i=1 L(yi, Φw(xi))+C w 2 . Problem: MSE is not quadratic in w and thus some solutions can be local minima. Nathalie Villa-Vialaneix | Comparison of metamodels 23/37
  • 54. Learning MLP Learning the weights: w are learned by a mean squared error minimization scheme penalized by a weight decay to avoid overfitting (ensure a better generalization ability): w∗ = arg min w N i=1 L(yi, Φw(xi))+C w 2 . Problem: MSE is not quadratic in w and thus some solutions can be local minima. Tuning the hyper-parameters, C and Q: simple validation was used to tune first C and Q. Nathalie Villa-Vialaneix | Comparison of metamodels 23/37
  • 55. SVM SVM is also an algorithm based on penalized error loss minimization: 1 Basic linear SVM for regression: Φ(w,b) is of the form x → wT x + b with (w, b) solution of arg min N i=1 L (yi, Φ(w,b)(xi)) + λ w 2 where λ is a regularization (hyper) parameter (to be tuned); L (y, ˆy) = max{|y − ˆy| − , 0} is an -insensitive loss function See -insensitive loss function Nathalie Villa-Vialaneix | Comparison of metamodels 24/37
  • 56. SVM SVM is also an algorithm based on penalized error loss minimization: 1 Basic linear SVM for regression 2 Non linear SVM for regression are the same except that a non linear (fixed) transformation of the inputs is previously made: ϕ(x) ∈ H is used instead of x. Original space X Feature space H Ψ (non linear) Nathalie Villa-Vialaneix | Comparison of metamodels 24/37
  • 57. SVM SVM is also an algorithm based on penalized error loss minimization: 1 Basic linear SVM for regression 2 Non linear SVM for regression are the same except that a non linear (fixed) transformation of the inputs is previously made: ϕ(x) ∈ H is used instead of x. Kernel trick: in fact, ϕ is never explicit but used through a kernel, K : Rd × Rd → R. This kernel is used for K(xi, xj) = ϕ(xi), ϕ(xj) . Original space X Feature space H Ψ (non linear) Nathalie Villa-Vialaneix | Comparison of metamodels 24/37
  • 58. SVM SVM is also an algorithm based on penalized error loss minimization: 1 Basic linear SVM for regression 2 Non linear SVM for regression are the same except that a non linear (fixed) transformation of the inputs is previously made: ϕ(x) ∈ H is used instead of x. Kernel trick: in fact, ϕ is never explicit but used through a kernel, K : Rd × Rd → R. This kernel is used for K(xi, xj) = ϕ(xi), ϕ(xj) . Common kernel: Gaussian kernel Kγ(u, v) = e−γ u−v 2 is known to have good theoretical properties both for accuracy and generalization. Nathalie Villa-Vialaneix | Comparison of metamodels 24/37
  • 59. Learning SVM Learning (w, b): w = N i=1 αiK(xi, .) and b are calculated by an exact optimization scheme (quadratic programming). The only step that can be time consumming is the calculation of the kernel matrix: K(xi, xj) for i, j = 1, . . . , n. Nathalie Villa-Vialaneix | Comparison of metamodels 25/37
  • 60. Learning SVM Learning (w, b): w = N i=1 αiK(xi, .) and b are calculated by an exact optimization scheme (quadratic programming). The only step that can be time consumming is the calculation of the kernel matrix: K(xi, xj) for i, j = 1, . . . , n. The resulting Φn is known to be of the form: Φn (x) = N i=1 αiK(xi, x) + b where only a few αi are non zero. The corresponding xi are called support vectors. Nathalie Villa-Vialaneix | Comparison of metamodels 25/37
  • 61. Learning SVM Learning (w, b): w = N i=1 αiK(xi, .) and b are calculated by an exact optimization scheme (quadratic programming). The only step that can be time consumming is the calculation of the kernel matrix: K(xi, xj) for i, j = 1, . . . , n. The resulting Φn is known to be of the form: Φn (x) = N i=1 αiK(xi, x) + b where only a few αi are non zero. The corresponding xi are called support vectors. Tuning of the hyper-parameters, C = 1/λ, and γ: simple validation has been used. To limit waste of time, has not been tuned in our experiments but set to the default value (1) which ensured 0.5n support vectors at most. Nathalie Villa-Vialaneix | Comparison of metamodels 25/37
  • 62. From regression tree to random forest Example of a regression tree | SOCt < 0.095 PH < 7.815 SOCt < 0.025 FR < 130.45 clay < 0.185 SOCt < 0.025 SOCt < 0.145 FR < 108.45 PH < 6.5 4.366 7.100 15.010 8.975 2.685 5.257 26.260 28.070 35.900 59.330 Each split is made such that the two induced subsets have the greatest homogeneity pos- sible. The prediction of a final node is the mean of the Y value of the observations belonging to this node. Nathalie Villa-Vialaneix | Comparison of metamodels 26/37
  • 63. Random forest Basic principle: combination of a large number of under-efficient regression trees (the prediction is the mean prediction of all trees). Nathalie Villa-Vialaneix | Comparison of metamodels 27/37
  • 64. Random forest Basic principle: combination of a large number of under-efficient regression trees (the prediction is the mean prediction of all trees). For each tree, two simplifications of the original method are performed: 1 A given number of observations are randomly chosen among the training set: this subset of the training data set is called in-bag sample whereas the other observations are called out-of-bag and are used to control the error of the tree; 2 For each node of the tree, a given number of variables are randomly chosen among all possible explanatory variables. The best split is then calculated on the basis of these variables and of the chosen observations. The chosen observations are the same for a given tree whereas the variables taken into account change for each split. Nathalie Villa-Vialaneix | Comparison of metamodels 27/37
  • 65. Additional tools OOB (Out-Of Bags) error: error based on the OOB predictions. Stabilization of OOB error is a good indication that there is enough trees in the forest. Nathalie Villa-Vialaneix | Comparison of metamodels 28/37
  • 66. Additional tools OOB (Out-Of Bags) error: error based on the OOB predictions. Stabilization of OOB error is a good indication that there is enough trees in the forest. Importance of a variable to help interpretation: for a given variable Xj (j ∈ {1, . . . , p}), the importance of Xj is the mean decrease in accuracy obtained when the values of Xj are randomized: I(Xj ) = E R(Φn (X(j) ), Y) − E (R(Φn (X), Y)) in which X(j) = (X1 , . . . , X(j), . . . , Xp ), X(j) being the variable Xj with permuted values. Nathalie Villa-Vialaneix | Comparison of metamodels 28/37
  • 67. Additional tools OOB (Out-Of Bags) error: error based on the OOB predictions. Stabilization of OOB error is a good indication that there is enough trees in the forest. Importance of a variable to help interpretation: for a given variable Xj (j ∈ {1, . . . , p}), the importance of Xj is the mean decrease in accuracy obtained when the values of Xj are randomized. Importance is estimated with OOB observations (see next slides for details) Nathalie Villa-Vialaneix | Comparison of metamodels 28/37
  • 68. Learning a random forest Random forest are not very sensitive to hyper-parameters (number of observations for each tree, number of variables for each split): the default values have been used. Nathalie Villa-Vialaneix | Comparison of metamodels 29/37
  • 69. Learning a random forest Random forest are not very sensitive to hyper-parameters (number of observations for each tree, number of variables for each split): the default values have been used. The number of trees should be large enough for the mean squared error based on out-of-sample observations to stabilize: 0 100 200 300 400 500 0246810 trees Error Out−of−bag (training) Test Nathalie Villa-Vialaneix | Comparison of metamodels 29/37
  • 70. Importance estimation in random forests OOB estimation for variable Xj 1: for b = 1 → B (loop on trees) do 2: permute values for (x j i )i: xi T b return x (j,b) i = (x1 i , . . . , x (j,b) i , . . . , x p i ), x (j,b) i permuted values 3: predict Φb x (j,b) i 4: end for 5: return OOB estimation of the importance of Xj 1 B B b=1   1 |T b| xi T b Φb (x (j,b) i ) − yi 2 − 1 |T b| xi T b Φb (xi) − yi 2   Nathalie Villa-Vialaneix | Comparison of metamodels 30/37
  • 71. Sommaire 1 DNDC-Europe model description 2 Methodology Underfitting / Overfitting Consistency Problem at stake 3 Presentation of the different methods 4 Results Nathalie Villa-Vialaneix | Comparison of metamodels 31/37
  • 72. Influence of the training sample size 5 6 7 8 9 0.50.60.70.80.91.0 N2O prediction log size (training) R2 LM1 LM2 Dace SDR ACOSSO MLP SVM RF Nathalie Villa-Vialaneix | Comparison of metamodels 32/37
  • 73. Influence of the training sample size 5 6 7 8 9 0.60.70.80.91.0 N leaching prediction log size (training) R2 LM1 LM2 Dace SDR ACOSSO MLP SVM RF Nathalie Villa-Vialaneix | Comparison of metamodels 32/37
  • 74. Computational time Use LM1 LM2 Dace SDR Acosso Train <1 s. 50 min 80 min 4 hours 65 min n Prediction <1 s. <1 s. 90 s. 14 min 4 min. Use MLP SVM RF Train 2.5 hours 5 hours 15 min Prediction 1 s. 20 s. 5 s. Time for DNDC: about 200 hours with a desktop computer and about 2 days using cluster 7! Nathalie Villa-Vialaneix | Comparison of metamodels 33/37
  • 75. Further comparisons Evaluation of the different step (time/difficulty) Training Validation Test LM1 ++ + LM2 + + ACOSSO = + - SDR = + - DACE = - - MLP - - + SVM = - - RF + + + Nathalie Villa-Vialaneix | Comparison of metamodels 34/37
  • 76. Understanding which inputs are important Example (N2O, RF): q q q q q q q q q q q 2 4 6 8 10 51015202530 Rank Importance(meandecreaseMSE) pH Nr N_MR Nfix N_FR clay NresTmean BD rain SOC The variables SOC and PH are the most important for accurate predictions. Nathalie Villa-Vialaneix | Comparison of metamodels 35/37
  • 77. Understanding which inputs are important Example (N leaching, SVM): q q q q q q q q q q q 2 4 6 8 10 050010001500 Rank Importance(decreaseMSE) N_FR Nres pH Nr clay rain SOC Tmean Nfix BD N_MR The variables N_MR, N_FR, Nres and pH are the most important for accurate predictions. Nathalie Villa-Vialaneix | Comparison of metamodels 35/37
  • 78. Thank you for your attention... ... questions? Nathalie Villa-Vialaneix | Comparison of metamodels 36/37
  • 79. Bishop, C. (1995). Neural Networks for Pattern Recognition. Oxford University Press, New York, USA. Boser, B., Guyon, I., and Vapnik, V. (1992). A training algorithm for optimal margin classifiers. In 5th annual ACM Workshop on COLT, pages 144–152. D. Haussler Editor, ACM Press. Breiman, L. (2001). Random forests. Machine Learning, 45(1):5–32. Vapnik, V. (1995). The Nature of Statistical Learning Theory. Springer Verlag, New York, USA. Villa-Vialaneix, N., Follador, M., Ratto, M., and Leip, A. (2012). A comparison of eight metamodeling techniques for the simulation of N2O fluxes and N leaching from corn crops. Environmental Modelling and Software, 34:51–66. Nathalie Villa-Vialaneix | Comparison of metamodels 37/37
  • 80. -insensitive loss function Go back Nathalie Villa-Vialaneix | Comparison of metamodels 37/37