International Workshop:
    Intelligent Analysis of Environmental Data




Institute of Geomatics and
   Analysis of Risk ...
Comments and questions to:
• Mikhail.Kanevski@unil.ch
  – www.unil.ch/igar
  – www.geokernels.org




                M. K...
General Introduction
 Typical problems
   Approaches
    Solutions
 Future research




    M. Kanevski, Palermo 2009   3
Geo- and Environmental Data
    (classes, continuous, images, networks, geomanifolds,…)

•    Spatio-temporal
•    Multi-s...
Spatio-temporal data in terms of
patterns/structures:

a. pattern recognition (pattern
discovery, pattern extraction),
b. ...
Main Topics:
• Review and posing of typical problems.
• From “numbers” to data
• Collection of data: Monitoring networks a...
Methods:
• Monitoring networks descriptions
• Geostatistics: predictions/simulations
• Machine Learning(neural nets, SLT):...
Spatial data analysis: typical tasks
•   Predict a value at a given point.
•   Build a map (isolines, 3D surfaces,..).
•  ...
Generic Methodology
                                                   Data Base
                       DATA
             ...
GEOSTATISTICAL ANALYSIS
• Basic/Naïve statistical analysis. EDA
• ESDA (regionalized EDA)
• Structural analysis. Spatial c...
Some Geostatistics
• Exploration of spatial correlations

• Family of kriging models (simple, ordinary,
  disjunctive, ind...
Briansk region (radioactivity, Cs137)




             M. Kanevski, Palermo 2009   12
Heavy metals, Japan




    M. Kanevski, Palermo 2009   13
Switzerland, indoor radon




       M. Kanevski, Palermo 2009   14
Measures to characterise MN


•   Topological
•   Statistical
•   Fractal/multifractal
•   Lacunarity




                ...
Preferential Sampling. Declustering
              Problem




            M. Kanevski, Palermo 2009   16
Example: geostatistical spatial co-predictions




       Sr90 « expensive » information.
    Cs137 « cheap » exhaustive i...
(Cross)Variography



            M. Kanevski, Palermo 2009   18
Use of Cs137 to
  improve Sr90
   predictions
 (reduced errors
and uncertainty).

Decision-oriented
    mapping:
« Thick i...
Simulations and Interpolations




          M. Kanevski, Palermo 2009   20
Unconditional simulations




       M. Kanevski, Palermo 2009   21
SGSim of the precipitation:




        M. Kanevski, Palermo 2009   22
Results of the simulations




        M. Kanevski, Palermo 2009   23
Post-processing of simulations: mean
       and standard deviation




            M. Kanevski, Palermo 2009   24
Geostatistics: some comments
• Geostatistics is a powerful and well elaborated
  model-dependent approach.
• Geostatistics...
Some useful comments, conclusions
         and future research

• 1. Detection of patterns: try k-NN or GRNN
• as an explo...
K- Nearest Neighbours




      M. Kanevski, Palermo 2009   27
K-NN prediction:
NN methods use those k-observations in the training data
 set T closest in input space to prediction poin...
k-NN Classifiers
These classifiers are memory-based and do
 not require any model to be fit! Given a
 query point x, we fi...
Because it uses only the training point closest to
  the query point, the bias of the 1-nn estimate is
  often low, but th...
Dirichlet cells, Thiessen tessellation,
          Voronoï polygons




              M. Kanevski, Palermo 2009   31
• How to find k ?

             Possible answer:

    Cross-validation or leave-one-out




                M. Kanevski, P...
k-NN prediction (n=6 ?)
                            W3~(1/n)

                                    3                      W...
Cross-validation
                         W3~(1/n)

                                 3                      W4~(1/n)
W2~(1...
Leave-next-one-out, etc
                         W3~(1/n)

                                3                      W4~(1/n)...
Data and k-nn Cross-
                 validation error curve




M. Kanevski, Palermo 2009                 36
Complete data set and
500 training points linearly interpolated




          M. Kanevski, Palermo 2009         37
Cross-validation curve




      M. Kanevski, Palermo 2009   38
K-nn predictions




  M. Kanevski, Palermo 2009   39
Machine Learning Algorithms
• Machine learning is an area of artificial intelligence
  concerned with the development of t...
Algorithms
Common algorithm types include:
• supervised learning – where the algorithm generates a function that
  maps in...
ML Topics (short lists)
• Machine learning topics
• Modeling conditional probability density functions,
  regression and c...
ML Topics (continued)
•   Modeling probability density functions through generative models:
     – Expectation-maximizatio...
Machine Learning
•   Artificial Neural Networks
3. Multilayer perceptrons (MLP)
4. General Regression Neural
   Networks (...
A Generic Model of
    Learning from Data/Examples


Generator            Supervisor


                        Learning
  ...
The Problem of Risk Minimization

In order to choose the best available model
  to the supervisor’s response, one measure
...
Three Main Learning Problems
• Regression Estimation. Let the supervisor’s
  answer y, be a real value, and let f(x,α ), α...
The Problem of Risk Minimization
Consider the expected value of the loss,
 given by the risk functional

    R (α) = ∫ L( ...
• Classification problem:
                                              A           B
                                    ...
Three Main Learning Problems
 • Pattern Recognition (classification).
 y = {0,1}, classification error:

                 ...
• Regression problem

                                          f(x) ?




                   f ( x)
                    ...
Three Main Learning Problems
• Regression Estimation
It is known that regression function is the one
   which minimizes th...
• Probability density estimation




 p(x)




                M. Kanevski, x
                             Palermo 2009   ...
Three Main Learning Problems
• Density Estimation. For this problem
  we consider the following loss-
  function:


    L(...
Inductive, Deductive and Transductive


                     F(x,y)

 Induction                                  Deduction...
Why Machine Learning algorithms?
 • Universal, nonlinear, robust tools
 • Data adapted
 • Easy data and knowledge integrat...
Our experience, some applications
• Hydrogeology, pollution/contamination (soil, water, air,
  food chains,…), topo-climat...
Model Selection & Model Evaluation




            M. Kanevski, Palermo 2009   58
Guillaume d'Occam (1285 - 1349)
            “Pluralitas non est ponenda sine
                      necessitate”



Occam’s...
Model Assessment and Model
         Selection:
    Two separate goals



        M. Kanevski, Palermo 2009   60
Model Selection:

Estimating the performance of different
 models in order to choose the
 (approximate) best one

        ...
If we are in a data-rich situation, the best
   solution is to split randomly (?) data


                 Raw Data

    Tr...
Interpretation

• The training set is used to fit the models

• The validation set is used to estimate prediction
  error ...
Bias and Variance.
                     Model’s complexity
          c. Underfitting
  3


2.5


  2                      ...
One of the most serious problems that arises in
  connectionist learning by neural networks is
  overfitting of the provid...
Bias-Variance Dilemma
Assume that
               Y = f (X ) + ε
               where
               E (ε ) = 0,
          ...
We can derive an expression for the
   expected prediction error of a
 regression at an input point X=x0
     using square...
∧
Err ( x0 ) = E[(Y − f ( x0 )) ¦ X = x0 ] =
                                   2

          ∧                            ...
• The first term is the variance of the target around
  its true mean f(x0), and cannot be avoided no
  matter how well we...
For the k-NN regression fit
                             ∧
Err ( x0 ) = E[(Y − f ( x0 )) ¦ X = x0 ] =   2

               ...
Elements of Statistical Learning. Hastie, Tibshirani & Friedman 2001


                   M. Kanevski, Palermo 2009       ...
M. Kanevski, Palermo 2009   72
• A neural network is only as good as the
  training data!

• Poor training data inevitably leads to an
  unreliable and u...
• If possible, prior to training, add some
  noise or other randomness to your
  example (such as a random scaling
  facto...
Hybrid Models:
Geostatistics + ML




    M. Kanevski, Palermo 2009   75
Data   F1,F2,...,Fn
               Structural analysis                                    Statistical        Trend
   Vari...
Model: Neural Network Residual Cokriging




Artificial Neural
Network Estimate                                   Final es...
Conclusions
• Machine Learning: universal data-driven
  recently developed approach with many
  successful applications. N...
Topics for the research
•   Multitask learning
•   Automatic feature selection/ feature extraction
•   Uncertainties chara...
Thank you for your attention!

                                   www.geokernels.org
          2004




          2008



...
Upcoming SlideShare
Loading in …5
×

Intelligent analysis of environmental data: an introduction Mikhail Kanevski – Institute of Geomatics and Risk Analysis (IGAR), University of Lausanne (Switzerland)

2,282 views

Published on

Intelligent analysis of environmental data: an introduction Mikhail Kanevski – Institute of Geomatics and Risk Analysis (IGAR), University of Lausanne (Switzerland)
Intelligent Analysis of Environmental Data (S4 ENVISA Workshop 2009)

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,282
On SlideShare
0
From Embeds
0
Number of Embeds
16
Actions
Shares
0
Downloads
88
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Intelligent analysis of environmental data: an introduction Mikhail Kanevski – Institute of Geomatics and Risk Analysis (IGAR), University of Lausanne (Switzerland)

  1. 1. International Workshop: Intelligent Analysis of Environmental Data Institute of Geomatics and Analysis of Risk (IGAR) University of Lausanne, Switzerland Prof. Mikhail Kanevski M. Kanevski, Palermo 2009 1
  2. 2. Comments and questions to: • Mikhail.Kanevski@unil.ch – www.unil.ch/igar – www.geokernels.org M. Kanevski, Palermo 2009 2
  3. 3. General Introduction Typical problems Approaches Solutions Future research M. Kanevski, Palermo 2009 3
  4. 4. Geo- and Environmental Data (classes, continuous, images, networks, geomanifolds,…) • Spatio-temporal • Multi-scale • Multivariate • Highly variable at many scales • High-dimensional geo-feature spaces • Uncertainties • …………. • In some cases we do have science-based models: data/knowledge/models integration M. Kanevski, Palermo 2009 4
  5. 5. Spatio-temporal data in terms of patterns/structures: a. pattern recognition (pattern discovery, pattern extraction), b. pattern modelling, c. pattern prediction M. Kanevski, Palermo 2009 5
  6. 6. Main Topics: • Review and posing of typical problems. • From “numbers” to data • Collection of data: Monitoring networks and data representativity? Monitoring network optimisation. • Get more information value from your data – EXPLORE ! Exploratory spatio-temporal data analysis (EDA, ESDA). • Predictions/estimations or simulations? Risk analysis and mapping • Let data speak for themselves: learning from data. Data mining, Machine learning. M. Kanevski, Palermo 2009 6
  7. 7. Methods: • Monitoring networks descriptions • Geostatistics: predictions/simulations • Machine Learning(neural nets, SLT): – Neural networks: MLP, PNN, GRNN, RBF, SOM. ANNEX models. Hybrid models – Support Vector Machines • Recent trends in geostatistics: Multiple-points geostatistics, pattern based geostatistics. • Bayesian approach for uncertainty assessment, integration of data and science-based models (Bayesian Maximum Entropy) M. Kanevski, Palermo 2009 7
  8. 8. Spatial data analysis: typical tasks • Predict a value at a given point. • Build a map (isolines, 3D surfaces,..). • Estimate prediction error. • Take into account measurement errors. • Risk mapping: Uncertainty mapping around unknown value. Estimate the probability of exceeding of a given/decision level. • Joint predictions of several variables (improve predictions on primary variable using auxiliary data and information). • Optimization of monitoring network (design/ redesign) • Simulations: modelling of spatial uncertainty and variability • Data/Science-based models assimilation/fusion • Image analysis. Remote sensing • Spatio-temporal events (forest fires, epidemiology, crime,…) • Predictions/simulations in high dimensional spaces • ……………………………………….. M. Kanevski, Palermo 2009 8
  9. 9. Generic Methodology Data Base DATA Management System Statistical Quick Monitoring Description Visualisation Network Analysis Variography Deterministic Monitoring Interpolations Network Cross-validation Generation Machine Learning Geostatistical Algorithms Predictions & Simulations Decision-oriented Mapping GIS, M. Kanevski, Palermo 2009 Remote Sensing 9
  10. 10. GEOSTATISTICAL ANALYSIS • Basic/Naïve statistical analysis. EDA • ESDA (regionalized EDA) • Structural analysis. Spatial correlation analysis (variography) • Model selection: Cross-validation, jack-knife,… • Prediction and error mapping for decision making (family of kriging models) • Probability and Risk mapping. Conditional stochastic simulations M. Kanevski, Palermo 2009 10
  11. 11. Some Geostatistics • Exploration of spatial correlations • Family of kriging models (simple, ordinary, disjunctive, indicator,…) • Conditional Stochastic Simulations M. Kanevski, Palermo 2009 11
  12. 12. Briansk region (radioactivity, Cs137) M. Kanevski, Palermo 2009 12
  13. 13. Heavy metals, Japan M. Kanevski, Palermo 2009 13
  14. 14. Switzerland, indoor radon M. Kanevski, Palermo 2009 14
  15. 15. Measures to characterise MN • Topological • Statistical • Fractal/multifractal • Lacunarity M. Kanevski, Palermo 2009 15
  16. 16. Preferential Sampling. Declustering Problem M. Kanevski, Palermo 2009 16
  17. 17. Example: geostatistical spatial co-predictions Sr90 « expensive » information. Cs137 « cheap » exhaustive information. M. Kanevski, Palermo 2009 17
  18. 18. (Cross)Variography M. Kanevski, Palermo 2009 18
  19. 19. Use of Cs137 to improve Sr90 predictions (reduced errors and uncertainty). Decision-oriented mapping: « Thick isolines » M. Kanevski, Palermo 2009 19
  20. 20. Simulations and Interpolations M. Kanevski, Palermo 2009 20
  21. 21. Unconditional simulations M. Kanevski, Palermo 2009 21
  22. 22. SGSim of the precipitation: M. Kanevski, Palermo 2009 22
  23. 23. Results of the simulations M. Kanevski, Palermo 2009 23
  24. 24. Post-processing of simulations: mean and standard deviation M. Kanevski, Palermo 2009 24
  25. 25. Geostatistics: some comments • Geostatistics is a powerful and well elaborated model-dependent approach. • Geostatistics proposes a variety of models for spatial data analysis and modeling. It has long and successful history of developments and applications • Some problems: Nonlinearity Non-stationarity Two-point statistics Data/models integration Data mining. Pattern recognition • Hybrid Models (ANN/SVM + Geostat) can help. M. Kanevski, Palermo 2009 25
  26. 26. Some useful comments, conclusions and future research • 1. Detection of patterns: try k-NN or GRNN • as an exploratory tools • Cross-validation: leave-one-out, leave k-out, jackknife,etc. as a control tool • Model selection and model asssessment M. Kanevski, Palermo 2009 26
  27. 27. K- Nearest Neighbours M. Kanevski, Palermo 2009 27
  28. 28. K-NN prediction: NN methods use those k-observations in the training data set T closest in input space to prediction point x to estimate Y k ∧ 1 Y= ∑( x) yi k xi ∈ Nk Where Nk(x) is the neighborhood of x defined by the closest points in the training set M. Kanevski, Palermo 2009 28
  29. 29. k-NN Classifiers These classifiers are memory-based and do not require any model to be fit! Given a query point x, we find the k training points closest in the distance to x and then classify using MAJORITY vote among the k neighbors. M. Kanevski, Palermo 2009 29
  30. 30. Because it uses only the training point closest to the query point, the bias of the 1-nn estimate is often low, but the variance is high. A famous result of Cover and Hurt (1967) shows that asymptotically the error rate of the 1-nn classifier is never more than twice the Bayes rate. This result can provide a rough idea about the best performance that is possible in a given problem: if the 1-nn rule has a 10% error rate, then asymptotically the Bayes error rate is at least 5%. M. Kanevski, Palermo 2009 30
  31. 31. Dirichlet cells, Thiessen tessellation, Voronoï polygons M. Kanevski, Palermo 2009 31
  32. 32. • How to find k ? Possible answer: Cross-validation or leave-one-out M. Kanevski, Palermo 2009 32
  33. 33. k-NN prediction (n=6 ?) W3~(1/n) 3 W4~(1/n) W2~(1/n) r3 4 2 r2 r4 r5 W5~(1/n) 5 r1 r6 6 W1~(1/n) W6~(1/n) 1 M. Kanevski, Palermo 2009 33
  34. 34. Cross-validation W3~(1/n) 3 W4~(1/n) W2~(1/n) r3 4 2 r2 r4 r5 W5~(1/n) 5 r1 r6 6 W1~(1/n) W6~(1/n) 1 Calculate error = (prediction-data) M. Kanevski, Palermo 2009 34
  35. 35. Leave-next-one-out, etc W3~(1/n) 3 W4~(1/n) W2~(1/n) r3 4 2 r2 r4 r5 W5~(1/n) r1 r6 6 W1~(1/n) W6~(1/n) 1 5 Calculate error = (prediction-data) M. Kanevski, Palermo 2009 35
  36. 36. Data and k-nn Cross- validation error curve M. Kanevski, Palermo 2009 36
  37. 37. Complete data set and 500 training points linearly interpolated M. Kanevski, Palermo 2009 37
  38. 38. Cross-validation curve M. Kanevski, Palermo 2009 38
  39. 39. K-nn predictions M. Kanevski, Palermo 2009 39
  40. 40. Machine Learning Algorithms • Machine learning is an area of artificial intelligence concerned with the development of techniques which allow computers to "learn". • More specifically, machine learning is a method for creating computer programs by the analysis of data sets. Machine learning overlaps heavily with statistics, since both fields study the analysis of data, but unlike statistics, machine learning is concerned with the algorithmic complexity of computational implementations. ... M. Kanevski, Palermo 2009 40
  41. 41. Algorithms Common algorithm types include: • supervised learning – where the algorithm generates a function that maps inputs to desired outputs. • unsupervised learning – which models a set of inputs: labeled examples are not available. • semi-supervised learning – which combines both labeled and unlabeled examples to generate an appropriate function or classifier. • reinforcement learning – where the algorithm learns a policy of how to act given an observation of the world. Every action has some impact in the environment, and the environment provides feedback that guides the learning algorithm. • transduction – similar to supervised learning, but does not explicitly construct a function: instead, tries to predict new outputs based on training inputs, training outputs, and new inputs. • The performance and computational analysis of machine learning algorithms is a branch of statistics known as computational learning theory. M. Kanevski, Palermo 2009 41
  42. 42. ML Topics (short lists) • Machine learning topics • Modeling conditional probability density functions, regression and classification – Artificial neural networks – Decision trees – Gene expression programming – Genetic Programming – Gaussian process regression – Linear discriminant analysis – k-Nearest Neighbor – Minimum message length – Perceptron – Quadratic classifier – Radial basis functions – Support vector machines M. Kanevski, Palermo 2009 42
  43. 43. ML Topics (continued) • Modeling probability density functions through generative models: – Expectation-maximization algorithm – Graphical models including Bayesian networks and Markov Random Fields – Generative Topographic Mapping • Appromixate inference techniques: – Markov chain Monte Carlo method – Variational Bayes • Meta-Learning (Ensemble methods): – Boosting – Bootstrap Aggregating aka Bagging – Random forest – Weighted Majority Algorithm • Optimization: most of methods listed above either use optimization or are instances of optimization algorithms. • Multi-objective Machine Learning: An approach that addresses multiple, and often confliciting learning objectives explicitly using Pareto-based multi- objective optimization techniques. M. Kanevski, Palermo 2009 43
  44. 44. Machine Learning • Artificial Neural Networks 3. Multilayer perceptrons (MLP) 4. General Regression Neural Networks (GRNN) • Statistical Learning Theory  Support Vector Classification  Support Vector Regression  Monitoring Networks Optimization M. Kanevski, Palermo 2009 44
  45. 45. A Generic Model of Learning from Data/Examples Generator Supervisor Learning Machine M. Kanevski, Palermo 2009 45
  46. 46. The Problem of Risk Minimization In order to choose the best available model to the supervisor’s response, one measure the LOSS or discrepancy L(y,f(x,α)) between the response y of the supervisor to a given input x and the response f(x,α) provided by the Loss Measure. M. Kanevski, Palermo 2009 46
  47. 47. Three Main Learning Problems • Regression Estimation. Let the supervisor’s answer y, be a real value, and let f(x,α ), α∈Λ , be a set of real functions which contains the regression function f ( x, α) = ydF ( y ¦ x ) 0 ∫ M. Kanevski, Palermo 2009 47
  48. 48. The Problem of Risk Minimization Consider the expected value of the loss, given by the risk functional R (α) = ∫ L( y , f ( x, α))dF ( x, y ) The goal is to find the function f(x,α 0) which minimises the risk in the situation where the joint pdf is unknown and the only available information is contained in the training set. M. Kanevski, Palermo 2009 48
  49. 49. • Classification problem: A B A A A A A B B A B A B A A A B B B B B B M. Kanevski, Palermo 2009 49
  50. 50. Three Main Learning Problems • Pattern Recognition (classification). y = {0,1}, classification error: 0, if y = f ( x,α ) L( y, f ( x,α )) = 1, if y ≠ f ( x,α ) M. Kanevski, Palermo 2009 50
  51. 51. • Regression problem f(x) ?  f ( x) ˆ  x→ y M. Kanevski, Palermo 2009 51
  52. 52. Three Main Learning Problems • Regression Estimation It is known that regression function is the one which minimizes the following loss-function: L( y, f ( x, α )) = ( y − f ( x, α )) 2 M. Kanevski, Palermo 2009 52
  53. 53. • Probability density estimation p(x) M. Kanevski, x Palermo 2009 53
  54. 54. Three Main Learning Problems • Density Estimation. For this problem we consider the following loss- function: L( p( x,α )) = − log p( x,α ) M. Kanevski, Palermo 2009 54
  55. 55. Inductive, Deductive and Transductive F(x,y) Induction Deduction Training samples (xi, yi) (ynew,xnew) Transduction M. Kanevski, Palermo 2009 55
  56. 56. Why Machine Learning algorithms? • Universal, nonlinear, robust tools • Data adapted • Easy data and knowledge integration • Efficient in high dimensional spaces • Good generalisation (low prediction error) • Input/feature selection M. Kanevski, Palermo 2009 56
  57. 57. Our experience, some applications • Hydrogeology, pollution/contamination (soil, water, air, food chains,…), topo-climatic modelling, geophysics • Renewable resources – wind fields • Natural hazards/risks: forest fires, avalanches, indoor radon, • Optimization of monitoring networks • Crime data, epidemiology • MNL for remote sensing, change detection • Socio-economic spatio-temporal multivariate data • Spatial econometrics. Financial data. Econophysics • Fractals, Chaos, EVT, • Time series M. Kanevski, Palermo 2009 57
  58. 58. Model Selection & Model Evaluation M. Kanevski, Palermo 2009 58
  59. 59. Guillaume d'Occam (1285 - 1349) “Pluralitas non est ponenda sine necessitate” Occam’s razor: “The more simple explanation of the phenomena is more likely to be correct” M. Kanevski, Palermo 2009 59
  60. 60. Model Assessment and Model Selection: Two separate goals M. Kanevski, Palermo 2009 60
  61. 61. Model Selection: Estimating the performance of different models in order to choose the (approximate) best one Model Assessment: Having chosen a final model, estimating its prediction error (generalization error) on new data M. Kanevski, Palermo 2009 61
  62. 62. If we are in a data-rich situation, the best solution is to split randomly (?) data Raw Data Train: 50% Validation:25% Test:25% (Train) (test) (validation) M. Kanevski, Palermo 2009 62
  63. 63. Interpretation • The training set is used to fit the models • The validation set is used to estimate prediction error for model selection (tuning hyperparameters) • The test set is used for assessment of the generalization error of the final chosen model Elements of Statistical Learning- Hastie, Tibshirani & Friedman 2001 M. Kanevski, Palermo 2009 63
  64. 64. Bias and Variance. Model’s complexity c. Underfitting 3 2.5 2 b. Overfitting 3 1.5 2.5 1 2 0.5 1.5 2 4 6 8 10 1 0.5 2 4 6 8 10 M. Kanevski, Palermo 2009 64
  65. 65. One of the most serious problems that arises in connectionist learning by neural networks is overfitting of the provided training examples. This means that the learned function fits very closely the training data however it does not generalise well, that is it can not model sufficiently well unseen data from the same task. Solution: Balance the statistical bias and statistical variance when doing neural network learning in order to achieve smallest average generalization error M. Kanevski, Palermo 2009 65
  66. 66. Bias-Variance Dilemma Assume that Y = f (X ) + ε where E (ε ) = 0, Var (ε ) = σ 2 ε M. Kanevski, Palermo 2009 66
  67. 67. We can derive an expression for the expected prediction error of a regression at an input point X=x0 using squared-error loss: M. Kanevski, Palermo 2009 67
  68. 68. ∧ Err ( x0 ) = E[(Y − f ( x0 )) ¦ X = x0 ] = 2 ∧ ∧ ∧ σ + [ E f ( x0 ) − f ( x0 )] + E[ f ( x0 ) − E f ( x0 )] = 2 ε 2 2 ∧ ∧ σ + Bias ( f ( x0 )) + Var ( f ( x0 )) = 2 ε 2 IrreducibleError + Bias + Variance 2 M. Kanevski, Palermo 2009 68
  69. 69. • The first term is the variance of the target around its true mean f(x0), and cannot be avoided no matter how well we estimate f(x0), unless σε2=0. • The second term is the squared bias, the amount by which the average of our estimate differs from the true mean • The last term is the variance, the expected squared deviation of ∧ around its mean. f ( x0 ) M. Kanevski, Palermo 2009 69
  70. 70. For the k-NN regression fit ∧ Err ( x0 ) = E[(Y − f ( x0 )) ¦ X = x0 ] = 2 k 1 σ + [ f ( x0 ) − ∑ f ( xl )] + σ ε / k 2 ε 2 2 k l =1 Here we assume for simplicity that training inputs are fixed, and the randomness arises from the Y. The number of neighbors k is inversely related to the model complexity M. Kanevski, Palermo 2009 70
  71. 71. Elements of Statistical Learning. Hastie, Tibshirani & Friedman 2001 M. Kanevski, Palermo 2009 71
  72. 72. M. Kanevski, Palermo 2009 72
  73. 73. • A neural network is only as good as the training data! • Poor training data inevitably leads to an unreliable and unpredictable network. • Exploratory Data Analysis and data preprocessing are extremely important!!! M. Kanevski, Palermo 2009 73
  74. 74. • If possible, prior to training, add some noise or other randomness to your example (such as a random scaling factor). This helps to account for noise and natural variability in real data, and tends to produce a more reliable network. M. Kanevski, Palermo 2009 74
  75. 75. Hybrid Models: Geostatistics + ML M. Kanevski, Palermo 2009 75
  76. 76. Data F1,F2,...,Fn Structural analysis Statistical Trend Variogram Raw Data Variogram description analysis Data for training validation testing Lag (km) ANN architecture choice Validation Testing Statistical description ANN Training Multivariate structural analysis Accuracy Test ANN estimates for F1,F2,...,Fn Variogram model for residuals Validation Residual Variogram ANN Residuals F1,F2,...,Fn Variogram Cross- validation Lag (km) Final estimates Cokriging (ANN + Geostatistics) errors estimates NNRK/CK Algorithm M. Kanevski, Palermo 2009 76
  77. 77. Model: Neural Network Residual Cokriging Artificial Neural Network Estimate Final estimate of 90Sr with Geostatistical Estimate NNRCK of the Residuals M. Kanevski, Palermo 2009 77
  78. 78. Conclusions • Machine Learning: universal data-driven recently developed approach with many successful applications. Nonlinear, robust. Integration of different types of data and information. Efficient in high dimensional space. • But: Depends on the quality and quantity of data. Uncertainty characterization. Diagnostic tools. Hyper-parameters tuning. M. Kanevski, Palermo 2009 78
  79. 79. Topics for the research • Multitask learning • Automatic feature selection/ feature extraction • Uncertainties characterisation • Understanding and visluation of high dimensional data • Modelling on geomanifold, semi-supervised learning • Active learning • MLA and simulations? • …………………………………………………… M. Kanevski, Palermo 2009 79
  80. 80. Thank you for your attention! www.geokernels.org 2004 2008 2009 www.unil.ch/igar M. Kanevski, Palermo 2009 80

×