Mapping and classification of spatial data using
       Machine Learning Office
                 software tools



       ...
Contents


•   Short description of the Machine Learning Office

•   SIC 2004:
    Application to the automatic cartograph...
Machine Learning Office


Part of the book:

   EPFL press

   June 2009
June 20

       09:00 – 12:00

        Room T120

Practical work session using
 Machine Learning software
Machine Learning Office
                                       Supervised
            Regression
•   Multilayer Perceptron...
Machine Learning Office
                           Unsupervised




        Clustering & density estimation

• K-Means & E...
Machine Learning Office
     Mixture of supervised and unsupervised




 Joint density estimation

• Mixture Density Netwo...
Automatic Mapping of Pollution Data


Procedure should be:

 1.   Simple, without difficult tuning of the models (can be
 ...
Automatic Mapping of Pollution Data



Good candidates:

 1.   KNN
 2. GRNN / PNN

                  Not so good candidate...
Automatic Mapping
                       with Prior Knowledge
                          in situations of
                 ...
Spatial Interpolation Comparison 2004
                            Introduction
Description of the concept of SIC 2004
Part...
Spatial Interpolation Comparison 2004
                                               Introduction
                        ...
Results of the GRNN models
           with cross-validation tuning

                      Emergency (joker)
              ...
Results
 In the following table the participants’ results for either of the
 two scenarios (routine and emergency) are pre...
Results of the SIC 2004 exercise
                                                    MAE                       ME         ...
Results of the SIC 2004 exercise
                                     MAE                       ME                       R...
Modeling of wind fields with MLP
                           and regularization technique
                                 ...
Modeling of wind fields with MLP
                              and regularization technique
Monitoring network:
111 statio...
Training of the MLP

Model:
MLP 26-20-20-3

Training:
• Random initialization
• 500 iterations of the
RPROP algorithm
Results: naîve approach
Results: Noisy ejection regularization
Results: summary
   Noisy ejection regularization




Without regularization (overfitting)
Thank you for your attention!

         Next stop is:


          June 20

       09:00 – 12:00

        Room T120

Practi...
Upcoming SlideShare
Loading in …5
×

Mapping and classification of spatial data using machine learning: algorithms and software tools Vadim Timonin – Institute of Geomatics and Risk Analysis (IGAR), University of Lausanne (Switzerland)

2,688 views
2,529 views

Published on

Mapping and classification of spatial data using machine learning: algorithms and software tools
Vadim Timonin – Institute of Geomatics and Risk Analysis (IGAR), University of Lausanne (Switzerland)
Intelligent Analysis of Environmental Data (S4 ENVISA Workshop 2009)

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,688
On SlideShare
0
From Embeds
0
Number of Embeds
19
Actions
Shares
0
Downloads
48
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Mapping and classification of spatial data using machine learning: algorithms and software tools Vadim Timonin – Institute of Geomatics and Risk Analysis (IGAR), University of Lausanne (Switzerland)

  1. 1. Mapping and classification of spatial data using Machine Learning Office software tools Vadim Timonin Institute of Geomatics and Analysis of Risk, University of Lausanne, Switzerland Vadim.Timonin @UNIL.ch
  2. 2. Contents • Short description of the Machine Learning Office • SIC 2004: Application to the automatic cartography of radioactivity • Case study: Wind fields mapping with neural network and regularization technique.
  3. 3. Machine Learning Office Part of the book: EPFL press June 2009
  4. 4. June 20 09:00 – 12:00 Room T120 Practical work session using Machine Learning software
  5. 5. Machine Learning Office Supervised Regression • Multilayer Perceptron (MLP) • General Regression Neural Networks (GRNN) • Radial Basis Function Neural Networks (RBFNN) • K-Nearest Neighbour (KNN) • Support Vector Regression (SVR) Classification • Multilayer Perceptron (MLP) • Probabilistic Neural Networks (PNN) • K-Nearest Neighbour (KNN) • Support Vector Machines (SVM)
  6. 6. Machine Learning Office Unsupervised Clustering & density estimation • K-Means & EM algorithms • Gaussian Mixture Model (GMM) • Self-Organizing (Kohonen) Maps (SOM)
  7. 7. Machine Learning Office Mixture of supervised and unsupervised Joint density estimation • Mixture Density Networks (MDN)
  8. 8. Automatic Mapping of Pollution Data Procedure should be: 1. Simple, without difficult tuning of the models (can be used by “non-expert” in machine learning) 3. Result should be unique (does not depend on training algorithms, initial values, etc.)
  9. 9. Automatic Mapping of Pollution Data Good candidates: 1. KNN 2. GRNN / PNN Not so good candidates (?): 1. MLP 2. RBFNN 3. SVM / SVR
  10. 10. Automatic Mapping with Prior Knowledge in situations of Routine and Emergency Spatial Interpolation Comparison 2004 http://www.ai-geostats.org/ Official report: Automatic mapping algorithms for routine and emergency monitoring data. EUR 21595 EN EC. Dubois G. (Ed.), Office for Official Publications of the European Communities, Luxembourg, 150 p., November 2005.
  11. 11. Spatial Interpolation Comparison 2004 Introduction Description of the concept of SIC 2004 Participants are invited using 200 observations (left, circles) to estimate (predict) values located at 1008 locations (right, crosses).
  12. 12. Spatial Interpolation Comparison 2004 Introduction Prior data sets From these 1008 monitoring locations, a single sampling scheme of 200 monitoring stations was selected randomly and extracted for each of the 10 datasets, in order to allow participants to train and design their algorithms. These 200 sampling locations have a spatial distribution that can be considered as nearly random. From the summary statistics, one can see that the subsets of 200 points are representative of the whole set of 1008 points. Note that is the choice of participant to use or do not use these prior information for modeling. Statistics for the training sets (n = 200) Statistics for the full sets (n = 1008) Set No Min Mean Median Max Std.Dev Min Mean Median Max Std.Dev 1 55.8 97.6 98.0 150.0 19.1 55.0 98.9 99.5 193.0 21.1 2 55.9 97.4 97.9 155.0 19.3 54.9 98.8 99.5 188.0 21.2 3 59.9 98.8 100.0 157.0 18.5 59.9 100.3 101.0 192.0 20.4 4 56.1 93.8 94.8 152.0 16.8 56.1 95.1 95.4 180.0 18.8 5 56.4 92.4 92.0 143.0 16.6 56.1 93.7 94.0 168.0 18.1 6 54.4 89.8 90.4 133.0 15.9 54.4 90.9 91.6 168.0 17.2 7 56.1 91.7 91.7 140.0 16.2 56.1 92.5 92.9 166.0 16.9 8 54.9 92.4 92.5 148.0 16.6 54.9 93.5 94.1 176.0 18.1 9 56.5 96.6 97.0 149.0 18.2 56.5 97.8 98.7 183.0 19.9 10 54.9 95.4 95.7 152.0 17.2 54.9 96.6 97.1 183.0 19.0
  13. 13. Results of the GRNN models with cross-validation tuning Emergency (joker) scenario Routine scenario Epicentre of accident (hot spot)
  14. 14. Results In the following table the participants’ results for either of the two scenarios (routine and emergency) are presented. The results have been sorted by Minimum Absolute Error (MAE) obtained in the case of the emergency scenario. Other statistics shown in this table are the Mean Error (ME) that allows to assess the bias of the results, the Root Mean Squared Error (RMSE), as well as Pearson’s Correlation Coefficient (Ro) between true and estimated values. • GEOSTATS denotes Geostatistical techniques • NN Neural Networks • SVM Support Vector Machine In each column, the best results have been bolded.
  15. 15. Results of the SIC 2004 exercise MAE ME RMSE Ro Participant Method routine joker routine joker routine joker routine joker Timonin NN 9.40 14.85 -1.25 -0.51 12.59 45.46 0.78 0.84 Fournier GEOSTATS 9.06 16.22 -1.32 -8.58 12.43 81.44 0.79 0.27 Pozdnoukhov SVM 9.22 16.25 -0.04 -6.70 12.47 81.00 0.79 0.28 (authors are Saveliev SPLINES 9.60 17.00 3.00 10.40 13.00 82.20 0.77 0.23 highlighted) Dutta Ingram NN GEOSTATS 9.92 9.10 17.50 18.55 0.20 -1.27 5.10 -4.64 13.10 12.46 80.60 54.22 0.76 0.79 0.29 0.86 Hofierka SPLINES 9.10 18.62 -1.30 0.41 12.51 73.68 0.79 0.50 Hofierka SPLINES 9.10 18.62 -1.30 0.41 12.51 73.68 0.79 0.50 Fournier GEOSTATS 9.22 19.43 -0.89 -0.22 12.51 73.50 0.78 0.48 Fournier OTHERS 9.29 19.44 -1.12 -0.12 12.56 71.87 0.78 0.53 Savelieva GEOSTATS 9.11 19.68 -1.39 -2.18 12.49 69.08 0.78 0.56 Palaseanu GEOSTATS 9.05 19.76 1.40 2.33 12.46 74.54 0.79 0.50 Rigol S. NN 12.10 20.30 -1.20 -9.40 15.80 84.10 0.67 0.12 Pebesma GEOSTATS 9.11 20.83 -1.22 0.92 12.44 73.73 0.79 0.50 Pebesma OTHERS 9.94 21.03 -1.35 4.50 13.32 72.12 0.78 0.51 Ingram GEOSTATS 9.08 21.77 -1.44 0.72 12.47 79.57 0.79 0.35 Lophaven GEOSTATS 9.70 22.20 1.20 -4.10 13.10 71.20 0.76 0.54 Saveliev SPLINES 9.30 22.20 1.60 0.60 12.60 76.40 0.78 0.41 Ingram GEOSTATS 9.47 22.53 -1.15 3.09 12.75 79.16 0.78 0.33 Pebesma GEOSTATS 9.11 23.26 -1.22 4.00 12.44 76.19 0.79 0.42 Rigol S. NN 16.00 25.30 -1.70 -11.10 20.80 87.50 0.55 0.02 Hofierka SPLINES 9.38 26.52 -1.27 4.29 12.68 77.98 0.78 0.38 Dutta NN 9.62 28.20 0.90 -0.22 12.70 80.10 0.78 0.31 Pebesma GEOSTATS 9.11 28.45 -1.22 12.01 12.44 81.41 0.79 0.38 Dutta NN 12.20 28.90 1.50 -1.29 15.90 79.90 0.64 0.33 Rigol S. NN 21.40 30.50 5.30 3.80 45.80 96.60 0.24 0.20 Ingram NN 9.72 38.29 -1.54 8.38 13.00 84.24 0.76 0.30 Dutta NN 9.93 38.50 2.18 17.98 13.30 87.30 0.76 0.27 Ingram NN 9.48 48.41 -1.22 -3.01 12.73 90.89 0.78 0.38 Pebesma GEOSTATS 9.11 146.36 -1.22 19.71 12.44 212.10 0.79 -0.27
  16. 16. Results of the SIC 2004 exercise MAE ME RMSE Ro Participant Method routine joker routine joker routine joker routine joker Timonin NN 9.40 14.85 -1.25 -0.51 12.59 45.46 0.78 0.84 Fournier GEOSTATS 9.06 16.22 -1.32 -8.58 12.43 81.44 0.79 0.27 Pozdnoukhov SVM 9.22 16.25 -0.04 -6.70 12.47 81.00 0.79 0.28 Saveliev SPLINES 9.60 17.00 3.00 10.40 13.00 82.20 0.77 0.23 Dutta NN 9.92 17.50 0.20 5.10 13.10 80.60 0.76 0.29 Ingram GEOSTATS 9.10 18.55 -1.27 -4.64 12.46 54.22 0.79 0.86 Hofierka SPLINES 9.10 18.62 -1.30 0.41 12.51 73.68 0.79 0.50 Hofierka SPLINES 9.10 18.62 -1.30 0.41 12.51 73.68 0.79 0.50 Fournier GEOSTATS 9.22 19.43 -0.89 -0.22 12.51 73.50 0.78 0.48 Fournier OTHERS 9.29 19.44 -1.12 -0.12 12.56 71.87 0.78 0.53 Savelieva GEOSTATS 9.11 19.68 -1.39 -2.18 12.49 69.08 0.78 0.56 Palaseanu GEOSTATS 9.05 19.76 1.40 2.33 12.46 74.54 0.79 0.50 Rigol S. NN 12.10 20.30 -1.20 -9.40 15.80 84.10 0.67 0.12 Pebesma GEOSTATS 9.11 20.83 -1.22 0.92 12.44 73.73 0.79 0.50 Pebesma OTHERS 9.94 21.03 -1.35 4.50 13.32 72.12 0.78 0.51 Ingram GEOSTATS 9.08 21.77 -1.44 0.72 12.47 79.57 0.79 0.35 Lophaven GEOSTATS 9.70 22.20 1.20 -4.10 13.10 71.20 0.76 0.54 Saveliev SPLINES 9.30 22.20 1.60 0.60 12.60 76.40 0.78 0.41 Ingram GEOSTATS 9.47 22.53 -1.15 3.09 12.75 79.16 0.78 0.33 Pebesma GEOSTATS 9.11 23.26 -1.22 4.00 12.44 76.19 0.79 0.42 Rigol S. NN 16.00 25.30 -1.70 -11.10 20.80 87.50 0.55 0.02 Hofierka SPLINES 9.38 26.52 -1.27 4.29 12.68 77.98 0.78 0.38 Dutta NN 9.62 28.20 0.90 -0.22 12.70 80.10 0.78 0.31 Pebesma GEOSTATS 9.11 28.45 -1.22 12.01 12.44 81.41 0.79 0.38 Dutta NN 12.20 28.90 1.50 -1.29 15.90 79.90 0.64 0.33 Rigol S. NN 21.40 30.50 5.30 3.80 45.80 96.60 0.24 0.20 Ingram NN 9.72 38.29 -1.54 8.38 13.00 84.24 0.76 0.30 Dutta NN 9.93 38.50 2.18 17.98 13.30 87.30 0.76 0.27 Ingram NN 9.48 48.41 -1.22 -3.01 12.73 90.89 0.78 0.38 Pebesma GEOSTATS 9.11 146.36 -1.22 19.71 12.44 212.10 0.79 -0.27
  17. 17. Modeling of wind fields with MLP and regularization technique (pp 168-172 of the book) Monitoring network: 111 stations in Switzerland (80 training + 31 for validation) Mapping of daily: • Mean speed • Maximum gust • Average direction
  18. 18. Modeling of wind fields with MLP and regularization technique Monitoring network: 111 stations in Switzerland (80 training + 31 for validation) Mapping of daily: • Mean speed • Maximum gust • Average direction Input information: X,Y geographical coordinates DEM (resolution 500 m) 23 DEM-based « geo-features » Total 26 features Model: MLP 26-20-20-3
  19. 19. Training of the MLP Model: MLP 26-20-20-3 Training: • Random initialization • 500 iterations of the RPROP algorithm
  20. 20. Results: naîve approach
  21. 21. Results: Noisy ejection regularization
  22. 22. Results: summary Noisy ejection regularization Without regularization (overfitting)
  23. 23. Thank you for your attention! Next stop is: June 20 09:00 – 12:00 Room T120 Practical work session using Machine Learning software

×