SlideShare a Scribd company logo
1 of 105
Download to read offline
Machine Learning Algorithms: Theory,
  Applications and Software Tools
            Lecture 2
       Basics of ANN: MLP
                  Prof. Mikhail Kanevski
      Institute of Geomatics and Analysis of Risk,
                  University of Lausanne

               Mikhail.Kanevski@unil.ch


                      Prof. M. Kanevski              1
Contents

• Introduction to artificial neural networks
• Multilayer perceptron
• Case studies




                   Prof. M. Kanevski           2
Basics of ANN
Artificial neural networks are analytical systems that
  address problems whose solutions have not been
  explicitly formulated.
In this way they contrast to classical computers and
  computer programs, which are designed to solve
  problems whose solutions - although they may be
  extremely complex - have been made explicit.




                     Prof. M. Kanevski              3
Basics of ANN
• We can program or train neural networks to store,
  recognise, and associatively retrieve patterns;
• to filter noise from measurement data;
• to control ill-defined problems;
in summary:
• to estimate sampled functions when we do not
  know the form of the functions.



                    Prof. M. Kanevski             4
Basics of ANN
Unlike statistical estimators, they estimate a function
 without a mathematical model of how outputs
 depend on inputs.
Neural networks are model-semifree estimators
 (semiparametric models). They "learn from
 experience" with numerical and, sometimes,
 linguistic sample data.




                     Prof. M. Kanevski               5
Basics of ANN
The major applications of ANN:
     • Feature recognition (pattern classification). Speech
       recognition
     • Signal processing
     • Time-series prediction
     • Function approximation and regression, classification
     • Data Mining
     • Intelligent control
     • Associative memories
     • Optimisation
     • And many others




                      Prof. M. Kanevski                   6
Basics of ANN.
Simple biological neuron




     Prof. M. Kanevski     7
Basics of ANN
Simple model of the neuron




     Prof. M. Kanevski   8
Examples of transfer
              functions.

                1
 f (x) =
         [1 + exp( − x )]
            [exp( x ) − exp( − x )]
tanh( x ) =
            [exp( x ) + exp( − x )]
              Prof. M. Kanevski       9
Basics of ANN

The main parts of ANN:
    • Neurones
    (nodes,     cells,  units,  processing
      elements)
    • Network topology
     (connections between neurones)



                Prof. M. Kanevski       10
Basics of ANN

In general, Artificial Neural Networks are
  a collection of simple computational
  units (cells) interlinked by a system of
  connections (synaptic connections).
  The number of units and connections
  form a network topology.



                Prof. M. Kanevski       11
Multilayer perceptron




   Prof. M. Kanevski    12
Basics of ANN.
                     ANN learning/training
Supervised learning is the most common training. Many samples
  Input(i), Output(i) are prepared as a training set. Then a subset from
  the training data set is selected. Samples from this subset are
  presented to the network one by one. For each sample results
  obtained by the network O[(input(i)] are compared with the desired
  O[utput(i)]. After presenting the entire training subset the weights are
  updated. This updating is done in such a way that a measure of the
  error between the network's and desired outputs is reduced. One pass
  through the subset of training samples, along with an updating of the
  weights is called an epoch. The number of samples in the subset is
  called epoch size. Sometimes an epoch size of one is used  .



                            Prof. M. Kanevski                          13
Basics of ANN.
       ANN supervised learning.

                                              Teacher
Examples
                                    Response
              Neural network

                                     Evaluation
           Modifications
                                     Of Response
           to Network
                                  Learning
                                  Algorithm


              Prof. M. Kanevski                         14
Basics of ANN
              Feedforward ANN.
If there are no feedback and lateral
  connections we have feedforward
  ANN. The most frequently used model
  is so called - multi-layer perceptron.
  The term feedforward means that
  information flows only in one direction -
  from the input to the output.



                 Prof. M. Kanevski       15
ANN Multi-layer Perceptron (MLP)

• Depends only on the data
and its inner structure
• Is able to learn from data
and generalise
• Good at modelling non-
linearities
• Robust to noise and
outliers


              [ANN = artificial neurons + connection weights]
                           Prof. M. Kanevski               16
Basics of ANN

All knowledge of ANN              is based on
 synaptic weights between units.




              Prof. M. Kanevski            17
The Universality Property

• A two layer feed-forward neural network
  with step activation functions can
  implement any Boolean function,
  provided that the number of hidden
  neurons H is sufficiently large.




                Prof. M. Kanevski           18
MLP modelling




F1 (t , w ) = w1out f ( w1t + b1 ) + bout ,
F2 (t , w ) = w1out f ( w1t + b1 ) + w2 f ( w2t + b2 ) + bout ,
                                      out


F3 (t , w ) = w1out f ( w1t + b1 ) + w2 f ( w2t + b2 ) + w3 f ( w3t + b3 ) + bout .
                                      out                 out



                                    Prof. M. Kanevski                          19
Backpropagation training




     Prof. M. Kanevski     20
Error function depends on
        network’s weights (W)


           n −1
         1
         n j =0
                  {
El (W ) = ∑ Tlj − Z lj (W )
                     out
                               }  2




           Prof. M. Kanevski      21
MLP training algorithms

Optimisation algorithms used for MLP training:
• Stochastic
    − Annealing
    − Genetic algorithm
• Gradient
   −   Conjugate gradients (slow 1st order gradient algorithm)
   −   Levenberg-Marquardt (fast 2nd order gradient algorithm)
   −   BFGS formula – quasi Newton
   −   Steepest Descent
   −   RProp – resilient propagation
   −   BackProp – back propagation
                          Prof. M. Kanevski                 22
Feedforward ANN: Multilayer
                     perceptron. Backprop algorithm

•   The possibilities and capabilities of multi-layer perceptrons stem from
    the non-linearities used within nodes. MLP can learn with supervised
    learning rule - backpropagation algorithm. The Backword Error
    Propagation algorithm for the ANN learning/training caused a
    breakthrough in the application of multilayer perceptrons.
•   The backpropagation algorithm is a supervised learning algorithm. The
    backpropagation algorithm is an iterative gradient algorithm
    designed to minimise the error measure between the actual output of
    the neural network and the desired output. We have to optimise a very
    non-linear system consisting of a large number of highly correlated
    variables.



                               Prof. M. Kanevski                         23
Basics of ANN
              Backpropagation Algorithm
The backpropagation algorithm follows the next
  algorithmic steps:
• 1. Initialize weights. Usually it is recommended to set all
  weights and node offsets to small random variables. In our
  study we shall use simulated annealing and/or genetic
  algorithm to select starting values more intelligently as it is
  recommended in [Masters].
• 2. Present inputs and desired outputs. The vectors
  (Inputl, Outputl=tl) are presented to the network.
• 3. Calculate the actual output of the ANN.



                         Prof. M. Kanevski                    24
Basics of ANN
            Backpropagation Algorithm
• 4. Calculate error measure and update the
  weights. Use a recursive algorithm starting at the
  output neurons (nodes) and working back to the
  first hidden layer - it is this backward propagation
  of output errors that inspired the name for this
  training algorithm. Update the weights W by




                     Prof. M. Kanevski             25
We want to know how to modify
     weights in order to decrease the
               error function


                         ∂ E(t)
wij (t +1) − wij (t) ∝ −
                         ∂ wij (t)


            Prof. M. Kanevski           26
Basics of ANN
                       Backpropagation Algorithm


             m                                  m                               m (m−1)
        w (n +1) = w (n) +ηδ Z
             ij                                 ij                             i  j

                                                       −1
                                                     (m )
w n - iteration step, η- rate of learning 0<η≤1), Zj
 here                                                       - output of the j-th neurone in the layer
             m
(m error δi for the output layer is defined byequation
  -1),




                                        Prof. M. Kanevski                                           27
Basics of ANN
           Backpropagation Algorithm

    out          out                    out                out
δ   i     = Z (1− Z )(Ti − Z )
                i                      i                  i


δ   i
     ( h −1)
               = Z (1 − Z ) ∑ w δ
                       i
                        h
                                            i
                                             h        h
                                                     ij
                                                             h
                                                             j
                                                 j



                        Prof. M. Kanevski                        28
Basics of ANN
              Backpropagation Algorithm
Other error measures (such as maximum absolute error and
  median squared error) have even greater advantages in
  many situations. For example, median squared error is
  useful because unlike the mean the median is a robust
  statistic - its value is insensitive to occasional large errors
  in the training data. Unfortunately, practical techniques for
  implementing these more desirable error measures do not
  yet exist. Thus, most neural networks today are tied to
  mean squared error measurements.




                         Prof. M. Kanevski                    29
Basics of ANN
             Backpropagation Algorithm
More general error functions can be written taking into
  account (weighting, declustering, economic criteria, etc.)
  importance of the samples presented to the network :



                    n −1

                    ∑{                               }
                                            out       2
    E l (W ) =        T        lj   − Z    lj     (W ) ω   lj
                    j=0




                       Prof. M. Kanevski                        30
Gradient descent


J(w)


       Direction of the gradient
                 J’(W)




                           Minimum        w

                      Prof. M. Kanevski       31
Gradient descent


J(w)




             Minimum        w

        Prof. M. Kanevski       32
In reality the situation with error
     function and corresponding
 optimization problem is much more
              complicated:

the presence of multiple local minima!


               Prof. M. Kanevski     33
Gradient descent

 Local minima




     Prof. M. Kanevski   34
SA: Illustration




Prof. M. Kanevski   35
How important are local
                      minima?
                        (Duda et al. 2001)
In computational practice, we do not want our
  network to be caught in a local minimum having
  high training error because this usually indicates
  that key features of the problem have not been
  learned by the network.
In such cases it is traditional to reinitialize the
  weights and train again, possibly also altering
  other parameters in the net


                     Prof. M. Kanevski                 36
How important are local
                   minima?
                    (Duda et al. 2001)

In many problems, convergence to a
  nonglobal minimum is acceptable, if the
  error is nevertheless fairly low.
  Furthermore, common stopping criteria
  demand that training terminate even
  before the minimum is reached, and thus it
  is not essential that the network be
  converging toward the global minimum or
  acceptable performance.
                  Prof. M. Kanevski        37
In short

The presence of multiple minima does not
 necessarily present difficulties in training
 nets, and a few simple heuristics can often
 overcome such problems (see next slide)




                  Prof. M. Kanevski             38
Practical techniques for
                  improving backpropagation
•   Activation function (sigmoid, hyperbolic tangent,..)
•   Scaling inputs
•   Training with noise (noise injection)
•   Initializing weights (simulated annealing)
•   Regularization (weight decay)
•   Number of hidden layers
•   Learning parameters (rates, momentum,..)
•   Cost function
•   ………………………………….




                           Prof. M. Kanevski               39
Interpretation of network’s
                              outputs
Consider the limit in which the size N of the training data set goes to
  infinity [Bishop 1995]. In this limit we can replace the finite sum over
  patterns in the sum-of-squares error with an integral of the form



                                 N
               1
      E = lim
              2N
                               ∑ ∑
                               n =1      k
                                              { y k ( x n ; w ) − t kn } 2


         1
       =
         2
              ∑ ∫∫ { y
               k
                           k
                                                    2
                               ( x ; w ) − t k } p ( t k , x ) dt k dx


                               Prof. M. Kanevski                         40
Interpretation of network’s
                       outputs
the network mapping is given by the conditional
  average of the target data, the regression of tk
  conditioned on x.


     y k ( x ; w *) = 〈 t k | x 〉


                   Prof. M. Kanevski            41
DEMO




Prof. M. Kanevski   42
MLP and number of layers

• The problem with MLP using single hidden
  layer is that the neurons tend to interact with
  each other globally. In complex situations ,
  this interaction makes it difficult to improve
  the approximation at one point without
  worsening it at some other point.
• On the other hand, with two hidden layers,
  the approximation process becomes more
  manageable.
                   Prof. M. Kanevski           43
Two hidden layers! (Haykin)
1. Local features are extracted in the first hidden
   layer. Specifically, some neurons in the first
   hidden layer are used to partition the input space
   into regions, and other neurons in that layer
   learn the local features characterizing those
   regions.
2. Global features are extracted in the second
   layer. Specifically, a neuron in the second
   hidden layer combines the outputs of neurons in
   the first hidden layer operating on a particular
   region of the input space and thereby learns the
   global features for that region and outputs zero
   elsewhere.
                     Prof. M. Kanevski            44
Data Preprocessing
• Machine learning
                                            Input data
  algorithms are data-
  driven methods.
                                          Pre-processing
• The quality and
                                               MLA
  quantity of data is
  essential for training
  and generalization                      Post-processing

                                               Results

                      Prof. M. Kanevski                     45
Types of pre-processing:
1. Linear and nonlinear transformations
               e.g input scaling/normalisation, Z-score transform,
       square root transform, N-score transform, etc.
2. Dimensionality reduction
3. Incorporate prior knowledge
              Invariants, hints,…
4. Feature extraction
       linear/nonlinear combination of input variables
5. Feature selection
       decide which features to use

                          Prof. M. Kanevski                      46
Dimensionality reduction

• Two approaches are available to perform
  dimensionality reduction:
• Feature extraction: creating a subset of new
  features by combinations of the existing
  features
• Feature selection: choosing a subset of all
  the features (the ones more informative)



                  Prof. M. Kanevski         47
Feature selection/extraction




      Prof. M. Kanevski    48
Feature selection

• Reducing the feature space by throwing
  out some of the features (covariates)
  – Also called variable selection
• Motivating idea: try to find a simple,
  “parsimonious” model (Occam’s razor!)



                    Prof. M. Kanevski      49
Univariate selection may fail




Guyon-Elisseeff, JMLR 2004; Springer 2006

                Prof. M. Kanevski           50
Dimensionality Reduction

Clearly losing some information but this can be helpful
   due to curse of dimensionality

Need some way of deciding what dimensions to keep

1.   Random choice
2.   Principal components analysis (PCA)
3.   Independent components analysis (ICA)
4.   Self-organised maps (SOM)
                       Prof. M. Kanevski                  51
Data transform
•   Y = aZ+b
•   Y = Log(Z)
•   Y = Ind(Z, Zs)
•   Normalisation: Zscore
    Y = (Z-Zm)/σ
• Box-Cox nonlinear transform :


                                   λ
                               Z −1
                   Y (λ ) =                       si λ > 0
                                   λ
                   Y (λ = 0) = Ln( Z )
                              Prof. M. Kanevski              52
Model Selection & Model Evaluation




             Prof. M. Kanevski       53
Guillaume d'Occam (1285 - 1349)

  “Pluralitas non est ponenda sine
            necessitate”



Occam’s razor:
“The more simple explanation
  of the phenomena is more
  likely to be correct”
      Prof. M. Kanevski          54
Model Assessment and Model
         Selection:
    Two separate goals



          Prof. M. Kanevski   55
Model Selection:

Estimating the performance of different
 models in order to choose the
 (approximate) best one

        Model Assessment:
Having chosen a final model, estimating its
 prediction error (generalization error) on
 new data

                 Prof. M. Kanevski            56
If we are in a data-rich situation, the best
   solution is to split randomly (?) data

                 Raw Data

    Train: 50%   Validation:25%        Test:25%
      (Train)         (test)          (validation)




                  Prof. M. Kanevski                  57
Interpretation

• The training set is used to fit the models

• The validation set is used to estimate prediction
  error for model selection (tuning
  hyperparameters)

• The test set is used for assessment of the
  generalization error of the final chosen model
        Elements of Statistical Learning- Hastie, Tibshirani & Friedman 2001

                            Prof. M. Kanevski                             58
Bias and Variance.
                                Model’s complexity

          c. Underfitting
 3


2.5


 2                                                       b. Overfitting
                                           3
1.5
                                         2.5

 1
                                           2

0.5
                                         1.5


      2    4         6      8     10       1


                                         0.5



                                                     2    4         6     8        10




                                 Prof. M. Kanevski                            59
One of the most serious problems that arises in
  connectionist learning by neural networks is
  overfitting of the provided training examples.
This means that the learned function fits very
  closely the training data however it does not
  generalise well, that is it can not model
  sufficiently well unseen data from the same task.
Solution: Balance the statistical bias and statistical
  variance when doing neural network learning in
  order to achieve smallest average generalization
  error


                      Prof. M. Kanevski              60
Bias-Variance Dilemma

Assume that
                 Y = f (X) + ε
                 where
                 E(ε ) = 0,
                                      2
                 Var(ε ) = σε
                  Prof. M. Kanevski       61
We can derive an expression for the
   expected prediction error of a
 regression at an input point X=x0
     using squared-error loss:




              Prof. M. Kanevski   62
∧
                                  2
Err ( x0 ) = E[(Y − f ( x0 )) ¦ X = x0 ] =
          ∧                                   ∧   ∧
  2                           2                         2
σ ε + [ E f ( x0 ) − f ( x0 )] + E[ f ( x0 ) − E f ( x0 )] =
                  ∧                   ∧
  2           2
σ ε + Bias ( f ( x0 )) + Var ( f ( x0 )) =
                                  2
IrreducibleError + Bias + Variance


                          Prof. M. Kanevski             63
• The first term is the variance of the target around
  its true mean f(x0), and cannot be avoided no
  matter how well we estimate f(x0), unless σε2=0.
• The second term is the squared bias, the amount
  by which the average of our estimate differs from
  the true mean
• The last term is the variance, the expected
                          ∧
  squared deviation of f (x )around its mean.
                                0




                       Prof. M. Kanevski                64
Elements of Statistical Learning. Hastie, Tibshirani & Friedman 2001


                       Prof. M. Kanevski                               65
Prof. M. Kanevski   66
• A neural network is only as good as the
  training data!

• Poor training data inevitably leads to an
  unreliable and unpredictable network.

• Exploratory Data Analysis and data
  preprocessing are extremely important!!!


                   Prof. M. Kanevski          67
MLP modelling. Case Studies.
Original (10 000 points)        Training (900 points)




                           Prof. M. Kanevski            68
MLP modeling
              Original                     MLP prediction




                                                  Train
Which result do you prefer?                       RMSE      1.97
                                                     Ro     0.69



                              Prof. M. Kanevski                    69
MLP modeling
        Original                         MLP prediction




Which result do you prefer?                       Train
                                                  RMSE    1.61
                                                     Ro   0.80


                              Prof. M. Kanevski                  70
MLP modeling
          Original                           MLP prediction




Which result do you prefer?                            Train
                                                       RMSE    1.67
                                                          Ro   0.79



                                Prof. M. Kanevski                     71
MLP modeling
       Original                            MLP prediction




                                                    Train
Which result do you prefer?                         RMSE    1.10
                                                       Ro   0.92



                              Prof. M. Kanevski                    72
MLP modeling
           Original                        MLP prediction




Which result do you prefer?                       Train
                                                  RMSE      0.83
                                                     Ro     0.95



                              Prof. M. Kanevski                    73
MLP modeling
          Original                         MLP prediction




                                                    Train
Which result do you prefer?                         RMSE    0.55
                                                       Ro   0.98


                              Prof. M. Kanevski                    74
MLP modeling
                                                               1.00

                                                                                                       15-15    20-20

        Trainig statistics                                     0.95
                                                                                               10-10


                                                               0.90
              5
       1.90
                                                               0.85
                           5-5




                                                          Ro
       1.70           10                                                     10    5-5
                                                               0.80

       1.50
                                                               0.75

       1.30                                                            5
RMSE




                                                               0.70

                                          10-10
       1.10                                                    0.65
                                                                       5     10    5-5         10-10   15-15   20-20
                                                  15-15                                  MLP
       0.90


       0.70
                                                               20-20

       0.50                                                                Model 20-20 is the best ?
                  5   10   5-5          10-10     15-15   20-20
                                 M LP
                                                   Prof. M. Kanevski                                              75
MLP modeling
Trainig statistics

       MLP               RMSE            Ro
        5                   1.97         0.69

                            1.61         0.80
        10
       5-5                  1.67         0.79


      10-10                 1.10         0.92


      15-15                 0.83         0.95


      20-20                0.55          0.98



                     Prof. M. Kanevski          76
MLP modeling

       Training &Validation statistics
                                                                    1.00
                       Validationg              Training

       2.10
              5                                                     0.95
                                                                                          10-10
       1.90                                                                                       15-15   20-20
                                                                    0.90
                  10    5-5
       1.70
                                                                    0.85
       1.50                                                                    10   5-5
                                                     20-20




                                                               Ro
                                                                    0.80
RMSE




                                     10-10   15-15
       1.30
                                                                    0.75
       1.10
                                                                           5
                                                                    0.70
       0.90

                                                                    0.65
       0.70

                                                                    0.60
       0.50
                                                                           5   10   5-5   10-10   15-15   20-20
              5   10    5-5      10-10       15-15   20-20
                                 MLP                 Prof. M. Kanevski                     MLP                    77
MLP modeling

       Training &Validation statistics
                                                                    1.00
                       Validationg              Training

       2.10
              5                                                     0.95
                                                                                          10-10
       1.90                                                                                       15-15   20-20
                                                                    0.90
                  10    5-5
       1.70
                                                                    0.85
       1.50                                                                    10   5-5
                                                     20-20




                                                               Ro
                                                                    0.80
RMSE




                                     10-10   15-15
       1.30
                                                                    0.75
       1.10
                                                                           5
                                                                    0.70
       0.90

                                                                    0.65
       0.70

                                                                    0.60
       0.50
                                                                           5   10   5-5   10-10   15-15   20-20
              5   10    5-5      10-10       15-15   20-20
                                 MLP                 Prof. M. Kanevski                     MLP                    78
MLP modeling
Validation statistics

        MLP          RMSE             Ro
         5              2.01          0.68

                        1.66          0.80
         10
         5-5            1.70          0.79


        10-10           1.25          0.89

        15-15           1.24          0.89

        20-20           1.39          0.88



                  Prof. M. Kanevski          79
ANNEX model: Artificial Neural
 Networks with External drift
 environmental data mapping




           Prof. M. Kanevski     80
Traditional application of
                  ANN to spatial predictions
  Data are available at measurement points: F(xi,yi),
for i= 1,…N
 Problem: Predict F(x,y) at the points without
measurements. Usually regular grid
 ANN solution: x,y - 2 inputs, F - output
- select ANN architecture
- train with available data
- after training use to predict
                          Prof. M. Kanevski             81
ANNEX is similar to “Kriging with
            External Drift Model”:
   If there is an additional information
 (available at training and prediction points)
related to the primary one, we can use it as
      an additional inputs to the ANN.


                                       Inputs: x,y,+fext(x,y)

                   Prof. M. Kanevski                       82
Examples of external
               information
• Cheap information on secondary
  variable

 Physical model of the phenomena
 Remotely sensed images
 GIS data
 DEM data


                Prof. M. Kanevski   83
Kriging with external drift
 Kriging with external drift is the model when trends
 are limited to

 E{F(x,y)}=m(x,y) = λ0 +λ1 fext(x,y)                (1)
where the smooth variability of the secondary variable
is considered to be related (e.g., linearly correlated) to
that of primary variable F(x,y) being estimated.
In general, kriging with an external drift is a simple
and efficient algorithms to incorporate a secondary
variable in the estimation of the primary variable.
                        Prof. M. Kanevski                    84
ANNEX model


What relationship between primary and
external information should be in case
             of ANNEX?



               Prof. M. Kanevski     85
ANNEX model
           What does external “related”
     (how to measure: correlation between variables?)
             information bring?
      Improved accuracy of prediction?
      Reduce uncertainty of prediction?
An important problem is related to the question of the
quality of additional data: there is a dilemma between
   introducing new information and/or new noise.
                       Prof. M. Kanevski                 86
Case study: Kazakh Priaralie,
     monitoring network




 1 400 000 km2 - 400 monitoring stations   87
               Prof. M. Kanevski
Datasets
                                        GIS DEM
                                         model




  Average long-term
temperatures of air in
      June (°C)


                    Prof. M. Kanevski             88
Correlation
Air temperature vs. Altitude




          Prof. M. Kanevski    89
Train and Test datasets

                           Train
                           Test




       Prof. M. Kanevski       90
ANN and ANNEX models
   Model         Correlation        RMSE       MAE    MRE

   2-7-5-1         0.917              2.57     1.96   -0.02

    3-3-1          0.989              0.96     0.73   -0.01
    3-5-1           0.99               0.9     0.7    -0.007
    3-7-1          0.991              0.85     0.66   -0.004
    3-8-1          0.991              0.84     0.68   -0.001

    3-9-1          0.991              0.88     0.69   -0.01

   3-10-1           0.99              0.92     0.74   -0.01
Kriging with
                   0.984              1.19     0.91   -0.03
external drift



                           Prof. M. Kanevski                   91
Scatter plots




Kriging   Cokriging                 Drift    ANNEX
                                   Kriging
               Prof. M. Kanevski                92
Mapping results
 Kriging                       Cokriging




 Drift                          ANNEX
Kriging




           Prof. M. Kanevski               93
Modelling noisy “altitude”
       effect (100 %)




Before                       After
         Prof. M. Kanevski           94
Scatter plots between variables
    (noisy 100 % altitude)




Train                       Test   95
        Prof. M. Kanevski
Mapping noise results
     ANNEX




Air temperature (°C)
     Prof. M. Kanevski   96
Noise results
        Model              Correlation       RMSE        MAE     MRE

       Kriging               0.874                3.13   2.04    -0.06

Kriging – external drift     0.984                1.19   0.91    -0.03

         3-7-1               0.991                0.85   0.66    -0.004

         3-8-1               0.991                0.84   0.68    -0.001

        3-8-1
                             0.839                3.54   2.37    -0.13
    (100% noise)
       3-7-1
                             0.939                2.32   -1.49   -0.003
 (10% noise) Test 1
Kriging – external drift
                             0.941                2.23   1.54    -0.06
 (10% noise) Test 1
         3-7-1
                             0.899                2.81   1.52    -0.08
 (10% noise) Test 2
Kriging – external drift
                             0.903                2.81   1.59    -0.103
 (10% noise) Test 2

                              Prof. M. Kanevski                           97
MLP: real case study


Wind fields in Switzerland




        Prof. M. Kanevski    98
Modeling of wind fields with MLP
                       using regularization technique
                                              (pp 168-172 of the book)
Monitoring network:
111 stations in Switzerland
(80 training + 31 for validation)

Mapping of daily:
• Mean speed
• Maximum gust
• Average direction




                                    Prof. M. Kanevski                    99
Modeling of wind fields with MLP
                         and regularization technique
Monitoring network:
111 stations in Switzerland (80 training + 31 for validation)

Mapping of daily:
• Mean speed
• Maximum gust
• Average direction

Input information:
X,Y geographical coordinates
DEM (resolution 500 m)
23 DEM-based « geo-features »
   Total 26 features


Model:
MLP 26-20-20-3

                                    Prof. M. Kanevski           100
Training of the MLP

Model:
MLP 26-20-20-3

Training:
• Random initialization
• 500 iterations of the
RPROP algorithm



                   Prof. M. Kanevski   101
Results: naîve approach




      Prof. M. Kanevski   102
Results: Noisy ejection regularization




          Prof. M. Kanevski        103
Results: summary
   Noisy ejection regularization




Without regularization (overfitting)




               Prof. M. Kanevski       104
Conclusion

• MLP is a nonlinear universal tool for the
  learning from and modeling of data.
  Excellent exploratory tool.

• Application demands deep expert
  knowledge and experience



                   Prof. M. Kanevski          105

More Related Content

What's hot

Neural Networks: Rosenblatt's Perceptron
Neural Networks: Rosenblatt's PerceptronNeural Networks: Rosenblatt's Perceptron
Neural Networks: Rosenblatt's PerceptronMostafa G. M. Mostafa
 
15 Machine Learning Multilayer Perceptron
15 Machine Learning Multilayer Perceptron15 Machine Learning Multilayer Perceptron
15 Machine Learning Multilayer PerceptronAndres Mendez-Vazquez
 
Learning in Networks: were Pavlov and Hebb right?
Learning in Networks: were Pavlov and Hebb right?Learning in Networks: were Pavlov and Hebb right?
Learning in Networks: were Pavlov and Hebb right?Victor Miagkikh
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyayabhishek upadhyay
 
Back propagation
Back propagationBack propagation
Back propagationNagarajan
 
MLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learningMLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learningCharles Deledalle
 
Mixed explicit-implicit peridynamic model
Mixed explicit-implicit peridynamic modelMixed explicit-implicit peridynamic model
Mixed explicit-implicit peridynamic modelmpolleschi
 
MPerceptron
MPerceptronMPerceptron
MPerceptronbutest
 
Neural network
Neural networkNeural network
Neural networkSilicon
 
P-Systems for approximating NP-Complete optimization problems
P-Systems for approximating NP-Complete optimization problemsP-Systems for approximating NP-Complete optimization problems
P-Systems for approximating NP-Complete optimization problemsFrancesco Corucci
 
Vis03 Workshop. DT-MRI Visualization
Vis03 Workshop. DT-MRI VisualizationVis03 Workshop. DT-MRI Visualization
Vis03 Workshop. DT-MRI VisualizationLeonid Zhukov
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networksSi Haem
 
CS221: HMM and Particle Filters
CS221: HMM and Particle FiltersCS221: HMM and Particle Filters
CS221: HMM and Particle Filterszukun
 
A temporal classifier system using spiking neural networks
A temporal classifier system using spiking neural networksA temporal classifier system using spiking neural networks
A temporal classifier system using spiking neural networksDaniele Loiacono
 
Neural networks
Neural networksNeural networks
Neural networksSlideshare
 

What's hot (20)

Neural Networks: Rosenblatt's Perceptron
Neural Networks: Rosenblatt's PerceptronNeural Networks: Rosenblatt's Perceptron
Neural Networks: Rosenblatt's Perceptron
 
15 Machine Learning Multilayer Perceptron
15 Machine Learning Multilayer Perceptron15 Machine Learning Multilayer Perceptron
15 Machine Learning Multilayer Perceptron
 
Learning in Networks: were Pavlov and Hebb right?
Learning in Networks: were Pavlov and Hebb right?Learning in Networks: were Pavlov and Hebb right?
Learning in Networks: were Pavlov and Hebb right?
 
neural networks
 neural networks neural networks
neural networks
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
 
Back propagation
Back propagationBack propagation
Back propagation
 
ICPR 2012
ICPR 2012ICPR 2012
ICPR 2012
 
MLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learningMLIP - Chapter 2 - Preliminaries to deep learning
MLIP - Chapter 2 - Preliminaries to deep learning
 
Mixed explicit-implicit peridynamic model
Mixed explicit-implicit peridynamic modelMixed explicit-implicit peridynamic model
Mixed explicit-implicit peridynamic model
 
hopfield neural network
hopfield neural networkhopfield neural network
hopfield neural network
 
MPerceptron
MPerceptronMPerceptron
MPerceptron
 
Neural network
Neural networkNeural network
Neural network
 
P-Systems for approximating NP-Complete optimization problems
P-Systems for approximating NP-Complete optimization problemsP-Systems for approximating NP-Complete optimization problems
P-Systems for approximating NP-Complete optimization problems
 
Vis03 Workshop. DT-MRI Visualization
Vis03 Workshop. DT-MRI VisualizationVis03 Workshop. DT-MRI Visualization
Vis03 Workshop. DT-MRI Visualization
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
CS221: HMM and Particle Filters
CS221: HMM and Particle FiltersCS221: HMM and Particle Filters
CS221: HMM and Particle Filters
 
Back propagation
Back propagation Back propagation
Back propagation
 
Neural Networks: Introducton
Neural Networks: IntroductonNeural Networks: Introducton
Neural Networks: Introducton
 
A temporal classifier system using spiking neural networks
A temporal classifier system using spiking neural networksA temporal classifier system using spiking neural networks
A temporal classifier system using spiking neural networks
 
Neural networks
Neural networksNeural networks
Neural networks
 

Similar to Csss2010 20100803-kanevski-lecture2

JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience hirokazutanaka
 
2013-1 Machine Learning Lecture 04 - Michael Negnevitsky - Artificial neur…
2013-1 Machine Learning Lecture 04 - Michael Negnevitsky - Artificial neur…2013-1 Machine Learning Lecture 04 - Michael Negnevitsky - Artificial neur…
2013-1 Machine Learning Lecture 04 - Michael Negnevitsky - Artificial neur…Dongseo University
 
Adaptive equalization
Adaptive equalizationAdaptive equalization
Adaptive equalizationKamal Bhatt
 
SOFT COMPUTERING TECHNICS -Unit 1
SOFT COMPUTERING TECHNICS -Unit 1SOFT COMPUTERING TECHNICS -Unit 1
SOFT COMPUTERING TECHNICS -Unit 1sravanthi computers
 
ACUMENS ON NEURAL NET AKG 20 7 23.pptx
ACUMENS ON NEURAL NET AKG 20 7 23.pptxACUMENS ON NEURAL NET AKG 20 7 23.pptx
ACUMENS ON NEURAL NET AKG 20 7 23.pptxgnans Kgnanshek
 
NEURALNETWORKS_DM_SOWMYAJYOTHI.pdf
NEURALNETWORKS_DM_SOWMYAJYOTHI.pdfNEURALNETWORKS_DM_SOWMYAJYOTHI.pdf
NEURALNETWORKS_DM_SOWMYAJYOTHI.pdfSowmyaJyothi3
 
Artificial neural networks (2)
Artificial neural networks (2)Artificial neural networks (2)
Artificial neural networks (2)sai anjaneya
 
ML_Unit_2_Part_A
ML_Unit_2_Part_AML_Unit_2_Part_A
ML_Unit_2_Part_ASrimatre K
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer PerceptronsESCOM
 
lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.pptbutest
 
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...Cemal Ardil
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural NetworkPrakash K
 
Chebyshev Functional Link Artificial Neural Networks for Denoising of Image C...
Chebyshev Functional Link Artificial Neural Networks for Denoising of Image C...Chebyshev Functional Link Artificial Neural Networks for Denoising of Image C...
Chebyshev Functional Link Artificial Neural Networks for Denoising of Image C...IDES Editor
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networksstellajoseph
 
Neural Networks-introduction_with_prodecure.pptx
Neural Networks-introduction_with_prodecure.pptxNeural Networks-introduction_with_prodecure.pptx
Neural Networks-introduction_with_prodecure.pptxRatuRumana3
 

Similar to Csss2010 20100803-kanevski-lecture2 (20)

JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
JAISTサマースクール2016「脳を知るための理論」講義04 Neural Networks and Neuroscience
 
2013-1 Machine Learning Lecture 04 - Michael Negnevitsky - Artificial neur…
2013-1 Machine Learning Lecture 04 - Michael Negnevitsky - Artificial neur…2013-1 Machine Learning Lecture 04 - Michael Negnevitsky - Artificial neur…
2013-1 Machine Learning Lecture 04 - Michael Negnevitsky - Artificial neur…
 
Adaptive equalization
Adaptive equalizationAdaptive equalization
Adaptive equalization
 
tutorial.ppt
tutorial.ppttutorial.ppt
tutorial.ppt
 
SOFT COMPUTERING TECHNICS -Unit 1
SOFT COMPUTERING TECHNICS -Unit 1SOFT COMPUTERING TECHNICS -Unit 1
SOFT COMPUTERING TECHNICS -Unit 1
 
ACUMENS ON NEURAL NET AKG 20 7 23.pptx
ACUMENS ON NEURAL NET AKG 20 7 23.pptxACUMENS ON NEURAL NET AKG 20 7 23.pptx
ACUMENS ON NEURAL NET AKG 20 7 23.pptx
 
Nn 1light
Nn 1lightNn 1light
Nn 1light
 
NEURALNETWORKS_DM_SOWMYAJYOTHI.pdf
NEURALNETWORKS_DM_SOWMYAJYOTHI.pdfNEURALNETWORKS_DM_SOWMYAJYOTHI.pdf
NEURALNETWORKS_DM_SOWMYAJYOTHI.pdf
 
Artificial neural networks (2)
Artificial neural networks (2)Artificial neural networks (2)
Artificial neural networks (2)
 
Perceptron
PerceptronPerceptron
Perceptron
 
ML_Unit_2_Part_A
ML_Unit_2_Part_AML_Unit_2_Part_A
ML_Unit_2_Part_A
 
Ffnn
FfnnFfnn
Ffnn
 
Multi-Layer Perceptrons
Multi-Layer PerceptronsMulti-Layer Perceptrons
Multi-Layer Perceptrons
 
lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.ppt
 
Nn devs
Nn devsNn devs
Nn devs
 
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
A comparison-of-first-and-second-order-training-algorithms-for-artificial-neu...
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
Chebyshev Functional Link Artificial Neural Networks for Denoising of Image C...
Chebyshev Functional Link Artificial Neural Networks for Denoising of Image C...Chebyshev Functional Link Artificial Neural Networks for Denoising of Image C...
Chebyshev Functional Link Artificial Neural Networks for Denoising of Image C...
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networks
 
Neural Networks-introduction_with_prodecure.pptx
Neural Networks-introduction_with_prodecure.pptxNeural Networks-introduction_with_prodecure.pptx
Neural Networks-introduction_with_prodecure.pptx
 

Recently uploaded

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Csss2010 20100803-kanevski-lecture2

  • 1. Machine Learning Algorithms: Theory, Applications and Software Tools Lecture 2 Basics of ANN: MLP Prof. Mikhail Kanevski Institute of Geomatics and Analysis of Risk, University of Lausanne Mikhail.Kanevski@unil.ch Prof. M. Kanevski 1
  • 2. Contents • Introduction to artificial neural networks • Multilayer perceptron • Case studies Prof. M. Kanevski 2
  • 3. Basics of ANN Artificial neural networks are analytical systems that address problems whose solutions have not been explicitly formulated. In this way they contrast to classical computers and computer programs, which are designed to solve problems whose solutions - although they may be extremely complex - have been made explicit. Prof. M. Kanevski 3
  • 4. Basics of ANN • We can program or train neural networks to store, recognise, and associatively retrieve patterns; • to filter noise from measurement data; • to control ill-defined problems; in summary: • to estimate sampled functions when we do not know the form of the functions. Prof. M. Kanevski 4
  • 5. Basics of ANN Unlike statistical estimators, they estimate a function without a mathematical model of how outputs depend on inputs. Neural networks are model-semifree estimators (semiparametric models). They "learn from experience" with numerical and, sometimes, linguistic sample data. Prof. M. Kanevski 5
  • 6. Basics of ANN The major applications of ANN: • Feature recognition (pattern classification). Speech recognition • Signal processing • Time-series prediction • Function approximation and regression, classification • Data Mining • Intelligent control • Associative memories • Optimisation • And many others Prof. M. Kanevski 6
  • 7. Basics of ANN. Simple biological neuron Prof. M. Kanevski 7
  • 8. Basics of ANN Simple model of the neuron Prof. M. Kanevski 8
  • 9. Examples of transfer functions. 1 f (x) = [1 + exp( − x )] [exp( x ) − exp( − x )] tanh( x ) = [exp( x ) + exp( − x )] Prof. M. Kanevski 9
  • 10. Basics of ANN The main parts of ANN: • Neurones (nodes, cells, units, processing elements) • Network topology (connections between neurones) Prof. M. Kanevski 10
  • 11. Basics of ANN In general, Artificial Neural Networks are a collection of simple computational units (cells) interlinked by a system of connections (synaptic connections). The number of units and connections form a network topology. Prof. M. Kanevski 11
  • 12. Multilayer perceptron Prof. M. Kanevski 12
  • 13. Basics of ANN. ANN learning/training Supervised learning is the most common training. Many samples Input(i), Output(i) are prepared as a training set. Then a subset from the training data set is selected. Samples from this subset are presented to the network one by one. For each sample results obtained by the network O[(input(i)] are compared with the desired O[utput(i)]. After presenting the entire training subset the weights are updated. This updating is done in such a way that a measure of the error between the network's and desired outputs is reduced. One pass through the subset of training samples, along with an updating of the weights is called an epoch. The number of samples in the subset is called epoch size. Sometimes an epoch size of one is used . Prof. M. Kanevski 13
  • 14. Basics of ANN. ANN supervised learning. Teacher Examples Response Neural network Evaluation Modifications Of Response to Network Learning Algorithm Prof. M. Kanevski 14
  • 15. Basics of ANN Feedforward ANN. If there are no feedback and lateral connections we have feedforward ANN. The most frequently used model is so called - multi-layer perceptron. The term feedforward means that information flows only in one direction - from the input to the output. Prof. M. Kanevski 15
  • 16. ANN Multi-layer Perceptron (MLP) • Depends only on the data and its inner structure • Is able to learn from data and generalise • Good at modelling non- linearities • Robust to noise and outliers [ANN = artificial neurons + connection weights] Prof. M. Kanevski 16
  • 17. Basics of ANN All knowledge of ANN is based on synaptic weights between units. Prof. M. Kanevski 17
  • 18. The Universality Property • A two layer feed-forward neural network with step activation functions can implement any Boolean function, provided that the number of hidden neurons H is sufficiently large. Prof. M. Kanevski 18
  • 19. MLP modelling F1 (t , w ) = w1out f ( w1t + b1 ) + bout , F2 (t , w ) = w1out f ( w1t + b1 ) + w2 f ( w2t + b2 ) + bout , out F3 (t , w ) = w1out f ( w1t + b1 ) + w2 f ( w2t + b2 ) + w3 f ( w3t + b3 ) + bout . out out Prof. M. Kanevski 19
  • 20. Backpropagation training Prof. M. Kanevski 20
  • 21. Error function depends on network’s weights (W) n −1 1 n j =0 { El (W ) = ∑ Tlj − Z lj (W ) out } 2 Prof. M. Kanevski 21
  • 22. MLP training algorithms Optimisation algorithms used for MLP training: • Stochastic − Annealing − Genetic algorithm • Gradient − Conjugate gradients (slow 1st order gradient algorithm) − Levenberg-Marquardt (fast 2nd order gradient algorithm) − BFGS formula – quasi Newton − Steepest Descent − RProp – resilient propagation − BackProp – back propagation Prof. M. Kanevski 22
  • 23. Feedforward ANN: Multilayer perceptron. Backprop algorithm • The possibilities and capabilities of multi-layer perceptrons stem from the non-linearities used within nodes. MLP can learn with supervised learning rule - backpropagation algorithm. The Backword Error Propagation algorithm for the ANN learning/training caused a breakthrough in the application of multilayer perceptrons. • The backpropagation algorithm is a supervised learning algorithm. The backpropagation algorithm is an iterative gradient algorithm designed to minimise the error measure between the actual output of the neural network and the desired output. We have to optimise a very non-linear system consisting of a large number of highly correlated variables. Prof. M. Kanevski 23
  • 24. Basics of ANN Backpropagation Algorithm The backpropagation algorithm follows the next algorithmic steps: • 1. Initialize weights. Usually it is recommended to set all weights and node offsets to small random variables. In our study we shall use simulated annealing and/or genetic algorithm to select starting values more intelligently as it is recommended in [Masters]. • 2. Present inputs and desired outputs. The vectors (Inputl, Outputl=tl) are presented to the network. • 3. Calculate the actual output of the ANN. Prof. M. Kanevski 24
  • 25. Basics of ANN Backpropagation Algorithm • 4. Calculate error measure and update the weights. Use a recursive algorithm starting at the output neurons (nodes) and working back to the first hidden layer - it is this backward propagation of output errors that inspired the name for this training algorithm. Update the weights W by Prof. M. Kanevski 25
  • 26. We want to know how to modify weights in order to decrease the error function ∂ E(t) wij (t +1) − wij (t) ∝ − ∂ wij (t) Prof. M. Kanevski 26
  • 27. Basics of ANN Backpropagation Algorithm m m m (m−1) w (n +1) = w (n) +ηδ Z ij ij i j −1 (m ) w n - iteration step, η- rate of learning 0<η≤1), Zj here - output of the j-th neurone in the layer m (m error δi for the output layer is defined byequation -1), Prof. M. Kanevski 27
  • 28. Basics of ANN Backpropagation Algorithm out out out out δ i = Z (1− Z )(Ti − Z ) i i i δ i ( h −1) = Z (1 − Z ) ∑ w δ i h i h h ij h j j Prof. M. Kanevski 28
  • 29. Basics of ANN Backpropagation Algorithm Other error measures (such as maximum absolute error and median squared error) have even greater advantages in many situations. For example, median squared error is useful because unlike the mean the median is a robust statistic - its value is insensitive to occasional large errors in the training data. Unfortunately, practical techniques for implementing these more desirable error measures do not yet exist. Thus, most neural networks today are tied to mean squared error measurements. Prof. M. Kanevski 29
  • 30. Basics of ANN Backpropagation Algorithm More general error functions can be written taking into account (weighting, declustering, economic criteria, etc.) importance of the samples presented to the network : n −1 ∑{ } out 2 E l (W ) = T lj − Z lj (W ) ω lj j=0 Prof. M. Kanevski 30
  • 31. Gradient descent J(w) Direction of the gradient J’(W) Minimum w Prof. M. Kanevski 31
  • 32. Gradient descent J(w) Minimum w Prof. M. Kanevski 32
  • 33. In reality the situation with error function and corresponding optimization problem is much more complicated: the presence of multiple local minima! Prof. M. Kanevski 33
  • 34. Gradient descent Local minima Prof. M. Kanevski 34
  • 36. How important are local minima? (Duda et al. 2001) In computational practice, we do not want our network to be caught in a local minimum having high training error because this usually indicates that key features of the problem have not been learned by the network. In such cases it is traditional to reinitialize the weights and train again, possibly also altering other parameters in the net Prof. M. Kanevski 36
  • 37. How important are local minima? (Duda et al. 2001) In many problems, convergence to a nonglobal minimum is acceptable, if the error is nevertheless fairly low. Furthermore, common stopping criteria demand that training terminate even before the minimum is reached, and thus it is not essential that the network be converging toward the global minimum or acceptable performance. Prof. M. Kanevski 37
  • 38. In short The presence of multiple minima does not necessarily present difficulties in training nets, and a few simple heuristics can often overcome such problems (see next slide) Prof. M. Kanevski 38
  • 39. Practical techniques for improving backpropagation • Activation function (sigmoid, hyperbolic tangent,..) • Scaling inputs • Training with noise (noise injection) • Initializing weights (simulated annealing) • Regularization (weight decay) • Number of hidden layers • Learning parameters (rates, momentum,..) • Cost function • …………………………………. Prof. M. Kanevski 39
  • 40. Interpretation of network’s outputs Consider the limit in which the size N of the training data set goes to infinity [Bishop 1995]. In this limit we can replace the finite sum over patterns in the sum-of-squares error with an integral of the form N 1 E = lim 2N ∑ ∑ n =1 k { y k ( x n ; w ) − t kn } 2 1 = 2 ∑ ∫∫ { y k k 2 ( x ; w ) − t k } p ( t k , x ) dt k dx Prof. M. Kanevski 40
  • 41. Interpretation of network’s outputs the network mapping is given by the conditional average of the target data, the regression of tk conditioned on x. y k ( x ; w *) = 〈 t k | x 〉 Prof. M. Kanevski 41
  • 43. MLP and number of layers • The problem with MLP using single hidden layer is that the neurons tend to interact with each other globally. In complex situations , this interaction makes it difficult to improve the approximation at one point without worsening it at some other point. • On the other hand, with two hidden layers, the approximation process becomes more manageable. Prof. M. Kanevski 43
  • 44. Two hidden layers! (Haykin) 1. Local features are extracted in the first hidden layer. Specifically, some neurons in the first hidden layer are used to partition the input space into regions, and other neurons in that layer learn the local features characterizing those regions. 2. Global features are extracted in the second layer. Specifically, a neuron in the second hidden layer combines the outputs of neurons in the first hidden layer operating on a particular region of the input space and thereby learns the global features for that region and outputs zero elsewhere. Prof. M. Kanevski 44
  • 45. Data Preprocessing • Machine learning Input data algorithms are data- driven methods. Pre-processing • The quality and MLA quantity of data is essential for training and generalization Post-processing Results Prof. M. Kanevski 45
  • 46. Types of pre-processing: 1. Linear and nonlinear transformations e.g input scaling/normalisation, Z-score transform, square root transform, N-score transform, etc. 2. Dimensionality reduction 3. Incorporate prior knowledge Invariants, hints,… 4. Feature extraction linear/nonlinear combination of input variables 5. Feature selection decide which features to use Prof. M. Kanevski 46
  • 47. Dimensionality reduction • Two approaches are available to perform dimensionality reduction: • Feature extraction: creating a subset of new features by combinations of the existing features • Feature selection: choosing a subset of all the features (the ones more informative) Prof. M. Kanevski 47
  • 48. Feature selection/extraction Prof. M. Kanevski 48
  • 49. Feature selection • Reducing the feature space by throwing out some of the features (covariates) – Also called variable selection • Motivating idea: try to find a simple, “parsimonious” model (Occam’s razor!) Prof. M. Kanevski 49
  • 50. Univariate selection may fail Guyon-Elisseeff, JMLR 2004; Springer 2006 Prof. M. Kanevski 50
  • 51. Dimensionality Reduction Clearly losing some information but this can be helpful due to curse of dimensionality Need some way of deciding what dimensions to keep 1. Random choice 2. Principal components analysis (PCA) 3. Independent components analysis (ICA) 4. Self-organised maps (SOM) Prof. M. Kanevski 51
  • 52. Data transform • Y = aZ+b • Y = Log(Z) • Y = Ind(Z, Zs) • Normalisation: Zscore Y = (Z-Zm)/σ • Box-Cox nonlinear transform : λ Z −1 Y (λ ) = si λ > 0 λ Y (λ = 0) = Ln( Z ) Prof. M. Kanevski 52
  • 53. Model Selection & Model Evaluation Prof. M. Kanevski 53
  • 54. Guillaume d'Occam (1285 - 1349) “Pluralitas non est ponenda sine necessitate” Occam’s razor: “The more simple explanation of the phenomena is more likely to be correct” Prof. M. Kanevski 54
  • 55. Model Assessment and Model Selection: Two separate goals Prof. M. Kanevski 55
  • 56. Model Selection: Estimating the performance of different models in order to choose the (approximate) best one Model Assessment: Having chosen a final model, estimating its prediction error (generalization error) on new data Prof. M. Kanevski 56
  • 57. If we are in a data-rich situation, the best solution is to split randomly (?) data Raw Data Train: 50% Validation:25% Test:25% (Train) (test) (validation) Prof. M. Kanevski 57
  • 58. Interpretation • The training set is used to fit the models • The validation set is used to estimate prediction error for model selection (tuning hyperparameters) • The test set is used for assessment of the generalization error of the final chosen model Elements of Statistical Learning- Hastie, Tibshirani & Friedman 2001 Prof. M. Kanevski 58
  • 59. Bias and Variance. Model’s complexity c. Underfitting 3 2.5 2 b. Overfitting 3 1.5 2.5 1 2 0.5 1.5 2 4 6 8 10 1 0.5 2 4 6 8 10 Prof. M. Kanevski 59
  • 60. One of the most serious problems that arises in connectionist learning by neural networks is overfitting of the provided training examples. This means that the learned function fits very closely the training data however it does not generalise well, that is it can not model sufficiently well unseen data from the same task. Solution: Balance the statistical bias and statistical variance when doing neural network learning in order to achieve smallest average generalization error Prof. M. Kanevski 60
  • 61. Bias-Variance Dilemma Assume that Y = f (X) + ε where E(ε ) = 0, 2 Var(ε ) = σε Prof. M. Kanevski 61
  • 62. We can derive an expression for the expected prediction error of a regression at an input point X=x0 using squared-error loss: Prof. M. Kanevski 62
  • 63. 2 Err ( x0 ) = E[(Y − f ( x0 )) ¦ X = x0 ] = ∧ ∧ ∧ 2 2 2 σ ε + [ E f ( x0 ) − f ( x0 )] + E[ f ( x0 ) − E f ( x0 )] = ∧ ∧ 2 2 σ ε + Bias ( f ( x0 )) + Var ( f ( x0 )) = 2 IrreducibleError + Bias + Variance Prof. M. Kanevski 63
  • 64. • The first term is the variance of the target around its true mean f(x0), and cannot be avoided no matter how well we estimate f(x0), unless σε2=0. • The second term is the squared bias, the amount by which the average of our estimate differs from the true mean • The last term is the variance, the expected ∧ squared deviation of f (x )around its mean. 0 Prof. M. Kanevski 64
  • 65. Elements of Statistical Learning. Hastie, Tibshirani & Friedman 2001 Prof. M. Kanevski 65
  • 67. • A neural network is only as good as the training data! • Poor training data inevitably leads to an unreliable and unpredictable network. • Exploratory Data Analysis and data preprocessing are extremely important!!! Prof. M. Kanevski 67
  • 68. MLP modelling. Case Studies. Original (10 000 points) Training (900 points) Prof. M. Kanevski 68
  • 69. MLP modeling Original MLP prediction Train Which result do you prefer? RMSE 1.97 Ro 0.69 Prof. M. Kanevski 69
  • 70. MLP modeling Original MLP prediction Which result do you prefer? Train RMSE 1.61 Ro 0.80 Prof. M. Kanevski 70
  • 71. MLP modeling Original MLP prediction Which result do you prefer? Train RMSE 1.67 Ro 0.79 Prof. M. Kanevski 71
  • 72. MLP modeling Original MLP prediction Train Which result do you prefer? RMSE 1.10 Ro 0.92 Prof. M. Kanevski 72
  • 73. MLP modeling Original MLP prediction Which result do you prefer? Train RMSE 0.83 Ro 0.95 Prof. M. Kanevski 73
  • 74. MLP modeling Original MLP prediction Train Which result do you prefer? RMSE 0.55 Ro 0.98 Prof. M. Kanevski 74
  • 75. MLP modeling 1.00 15-15 20-20 Trainig statistics 0.95 10-10 0.90 5 1.90 0.85 5-5 Ro 1.70 10 10 5-5 0.80 1.50 0.75 1.30 5 RMSE 0.70 10-10 1.10 0.65 5 10 5-5 10-10 15-15 20-20 15-15 MLP 0.90 0.70 20-20 0.50 Model 20-20 is the best ? 5 10 5-5 10-10 15-15 20-20 M LP Prof. M. Kanevski 75
  • 76. MLP modeling Trainig statistics MLP RMSE Ro 5 1.97 0.69 1.61 0.80 10 5-5 1.67 0.79 10-10 1.10 0.92 15-15 0.83 0.95 20-20 0.55 0.98 Prof. M. Kanevski 76
  • 77. MLP modeling Training &Validation statistics 1.00 Validationg Training 2.10 5 0.95 10-10 1.90 15-15 20-20 0.90 10 5-5 1.70 0.85 1.50 10 5-5 20-20 Ro 0.80 RMSE 10-10 15-15 1.30 0.75 1.10 5 0.70 0.90 0.65 0.70 0.60 0.50 5 10 5-5 10-10 15-15 20-20 5 10 5-5 10-10 15-15 20-20 MLP Prof. M. Kanevski MLP 77
  • 78. MLP modeling Training &Validation statistics 1.00 Validationg Training 2.10 5 0.95 10-10 1.90 15-15 20-20 0.90 10 5-5 1.70 0.85 1.50 10 5-5 20-20 Ro 0.80 RMSE 10-10 15-15 1.30 0.75 1.10 5 0.70 0.90 0.65 0.70 0.60 0.50 5 10 5-5 10-10 15-15 20-20 5 10 5-5 10-10 15-15 20-20 MLP Prof. M. Kanevski MLP 78
  • 79. MLP modeling Validation statistics MLP RMSE Ro 5 2.01 0.68 1.66 0.80 10 5-5 1.70 0.79 10-10 1.25 0.89 15-15 1.24 0.89 20-20 1.39 0.88 Prof. M. Kanevski 79
  • 80. ANNEX model: Artificial Neural Networks with External drift environmental data mapping Prof. M. Kanevski 80
  • 81. Traditional application of ANN to spatial predictions Data are available at measurement points: F(xi,yi), for i= 1,…N Problem: Predict F(x,y) at the points without measurements. Usually regular grid ANN solution: x,y - 2 inputs, F - output - select ANN architecture - train with available data - after training use to predict Prof. M. Kanevski 81
  • 82. ANNEX is similar to “Kriging with External Drift Model”: If there is an additional information (available at training and prediction points) related to the primary one, we can use it as an additional inputs to the ANN. Inputs: x,y,+fext(x,y) Prof. M. Kanevski 82
  • 83. Examples of external information • Cheap information on secondary variable Physical model of the phenomena Remotely sensed images GIS data DEM data Prof. M. Kanevski 83
  • 84. Kriging with external drift Kriging with external drift is the model when trends are limited to E{F(x,y)}=m(x,y) = λ0 +λ1 fext(x,y) (1) where the smooth variability of the secondary variable is considered to be related (e.g., linearly correlated) to that of primary variable F(x,y) being estimated. In general, kriging with an external drift is a simple and efficient algorithms to incorporate a secondary variable in the estimation of the primary variable. Prof. M. Kanevski 84
  • 85. ANNEX model What relationship between primary and external information should be in case of ANNEX? Prof. M. Kanevski 85
  • 86. ANNEX model What does external “related” (how to measure: correlation between variables?) information bring? Improved accuracy of prediction? Reduce uncertainty of prediction? An important problem is related to the question of the quality of additional data: there is a dilemma between introducing new information and/or new noise. Prof. M. Kanevski 86
  • 87. Case study: Kazakh Priaralie, monitoring network 1 400 000 km2 - 400 monitoring stations 87 Prof. M. Kanevski
  • 88. Datasets GIS DEM model Average long-term temperatures of air in June (°C) Prof. M. Kanevski 88
  • 89. Correlation Air temperature vs. Altitude Prof. M. Kanevski 89
  • 90. Train and Test datasets Train Test Prof. M. Kanevski 90
  • 91. ANN and ANNEX models Model Correlation RMSE MAE MRE 2-7-5-1 0.917 2.57 1.96 -0.02 3-3-1 0.989 0.96 0.73 -0.01 3-5-1 0.99 0.9 0.7 -0.007 3-7-1 0.991 0.85 0.66 -0.004 3-8-1 0.991 0.84 0.68 -0.001 3-9-1 0.991 0.88 0.69 -0.01 3-10-1 0.99 0.92 0.74 -0.01 Kriging with 0.984 1.19 0.91 -0.03 external drift Prof. M. Kanevski 91
  • 92. Scatter plots Kriging Cokriging Drift ANNEX Kriging Prof. M. Kanevski 92
  • 93. Mapping results Kriging Cokriging Drift ANNEX Kriging Prof. M. Kanevski 93
  • 94. Modelling noisy “altitude” effect (100 %) Before After Prof. M. Kanevski 94
  • 95. Scatter plots between variables (noisy 100 % altitude) Train Test 95 Prof. M. Kanevski
  • 96. Mapping noise results ANNEX Air temperature (°C) Prof. M. Kanevski 96
  • 97. Noise results Model Correlation RMSE MAE MRE Kriging 0.874 3.13 2.04 -0.06 Kriging – external drift 0.984 1.19 0.91 -0.03 3-7-1 0.991 0.85 0.66 -0.004 3-8-1 0.991 0.84 0.68 -0.001 3-8-1 0.839 3.54 2.37 -0.13 (100% noise) 3-7-1 0.939 2.32 -1.49 -0.003 (10% noise) Test 1 Kriging – external drift 0.941 2.23 1.54 -0.06 (10% noise) Test 1 3-7-1 0.899 2.81 1.52 -0.08 (10% noise) Test 2 Kriging – external drift 0.903 2.81 1.59 -0.103 (10% noise) Test 2 Prof. M. Kanevski 97
  • 98. MLP: real case study Wind fields in Switzerland Prof. M. Kanevski 98
  • 99. Modeling of wind fields with MLP using regularization technique (pp 168-172 of the book) Monitoring network: 111 stations in Switzerland (80 training + 31 for validation) Mapping of daily: • Mean speed • Maximum gust • Average direction Prof. M. Kanevski 99
  • 100. Modeling of wind fields with MLP and regularization technique Monitoring network: 111 stations in Switzerland (80 training + 31 for validation) Mapping of daily: • Mean speed • Maximum gust • Average direction Input information: X,Y geographical coordinates DEM (resolution 500 m) 23 DEM-based « geo-features » Total 26 features Model: MLP 26-20-20-3 Prof. M. Kanevski 100
  • 101. Training of the MLP Model: MLP 26-20-20-3 Training: • Random initialization • 500 iterations of the RPROP algorithm Prof. M. Kanevski 101
  • 102. Results: naîve approach Prof. M. Kanevski 102
  • 103. Results: Noisy ejection regularization Prof. M. Kanevski 103
  • 104. Results: summary Noisy ejection regularization Without regularization (overfitting) Prof. M. Kanevski 104
  • 105. Conclusion • MLP is a nonlinear universal tool for the learning from and modeling of data. Excellent exploratory tool. • Application demands deep expert knowledge and experience Prof. M. Kanevski 105