Mapping and classification of spatial data using machine learning: algorithms and software tools Vadim Timonin – Institute of Geomatics and Risk Analysis (IGAR), University of Lausanne (Switzerland)
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Mapping and classification of spatial data using machine learning: algorithms and software tools Vadim Timonin – Institute of Geomatics and Risk Analysis (IGAR), University of Lausanne (Switzerland)

  • 3,003 views
Uploaded on

Mapping and classification of spatial data using machine learning: algorithms and software tools...

Mapping and classification of spatial data using machine learning: algorithms and software tools
Vadim Timonin – Institute of Geomatics and Risk Analysis (IGAR), University of Lausanne (Switzerland)
Intelligent Analysis of Environmental Data (S4 ENVISA Workshop 2009)

More in: Technology , Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
3,003
On Slideshare
2,992
From Embeds
11
Number of Embeds
1

Actions

Shares
Downloads
31
Comments
0
Likes
1

Embeds 11

http://www.slideshare.net 11

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Mapping and classification of spatial data using Machine Learning Office software tools Vadim Timonin Institute of Geomatics and Analysis of Risk, University of Lausanne, Switzerland Vadim.Timonin @UNIL.ch
  • 2. Contents • Short description of the Machine Learning Office • SIC 2004: Application to the automatic cartography of radioactivity • Case study: Wind fields mapping with neural network and regularization technique.
  • 3. Machine Learning Office Part of the book: EPFL press June 2009
  • 4. June 20 09:00 – 12:00 Room T120 Practical work session using Machine Learning software
  • 5. Machine Learning Office Supervised Regression • Multilayer Perceptron (MLP) • General Regression Neural Networks (GRNN) • Radial Basis Function Neural Networks (RBFNN) • K-Nearest Neighbour (KNN) • Support Vector Regression (SVR) Classification • Multilayer Perceptron (MLP) • Probabilistic Neural Networks (PNN) • K-Nearest Neighbour (KNN) • Support Vector Machines (SVM)
  • 6. Machine Learning Office Unsupervised Clustering & density estimation • K-Means & EM algorithms • Gaussian Mixture Model (GMM) • Self-Organizing (Kohonen) Maps (SOM)
  • 7. Machine Learning Office Mixture of supervised and unsupervised Joint density estimation • Mixture Density Networks (MDN)
  • 8. Automatic Mapping of Pollution Data Procedure should be: 1. Simple, without difficult tuning of the models (can be used by “non-expert” in machine learning) 3. Result should be unique (does not depend on training algorithms, initial values, etc.)
  • 9. Automatic Mapping of Pollution Data Good candidates: 1. KNN 2. GRNN / PNN Not so good candidates (?): 1. MLP 2. RBFNN 3. SVM / SVR
  • 10. Automatic Mapping with Prior Knowledge in situations of Routine and Emergency Spatial Interpolation Comparison 2004 http://www.ai-geostats.org/ Official report: Automatic mapping algorithms for routine and emergency monitoring data. EUR 21595 EN EC. Dubois G. (Ed.), Office for Official Publications of the European Communities, Luxembourg, 150 p., November 2005.
  • 11. Spatial Interpolation Comparison 2004 Introduction Description of the concept of SIC 2004 Participants are invited using 200 observations (left, circles) to estimate (predict) values located at 1008 locations (right, crosses).
  • 12. Spatial Interpolation Comparison 2004 Introduction Prior data sets From these 1008 monitoring locations, a single sampling scheme of 200 monitoring stations was selected randomly and extracted for each of the 10 datasets, in order to allow participants to train and design their algorithms. These 200 sampling locations have a spatial distribution that can be considered as nearly random. From the summary statistics, one can see that the subsets of 200 points are representative of the whole set of 1008 points. Note that is the choice of participant to use or do not use these prior information for modeling. Statistics for the training sets (n = 200) Statistics for the full sets (n = 1008) Set No Min Mean Median Max Std.Dev Min Mean Median Max Std.Dev 1 55.8 97.6 98.0 150.0 19.1 55.0 98.9 99.5 193.0 21.1 2 55.9 97.4 97.9 155.0 19.3 54.9 98.8 99.5 188.0 21.2 3 59.9 98.8 100.0 157.0 18.5 59.9 100.3 101.0 192.0 20.4 4 56.1 93.8 94.8 152.0 16.8 56.1 95.1 95.4 180.0 18.8 5 56.4 92.4 92.0 143.0 16.6 56.1 93.7 94.0 168.0 18.1 6 54.4 89.8 90.4 133.0 15.9 54.4 90.9 91.6 168.0 17.2 7 56.1 91.7 91.7 140.0 16.2 56.1 92.5 92.9 166.0 16.9 8 54.9 92.4 92.5 148.0 16.6 54.9 93.5 94.1 176.0 18.1 9 56.5 96.6 97.0 149.0 18.2 56.5 97.8 98.7 183.0 19.9 10 54.9 95.4 95.7 152.0 17.2 54.9 96.6 97.1 183.0 19.0
  • 13. Results of the GRNN models with cross-validation tuning Emergency (joker) scenario Routine scenario Epicentre of accident (hot spot)
  • 14. Results In the following table the participants’ results for either of the two scenarios (routine and emergency) are presented. The results have been sorted by Minimum Absolute Error (MAE) obtained in the case of the emergency scenario. Other statistics shown in this table are the Mean Error (ME) that allows to assess the bias of the results, the Root Mean Squared Error (RMSE), as well as Pearson’s Correlation Coefficient (Ro) between true and estimated values. • GEOSTATS denotes Geostatistical techniques • NN Neural Networks • SVM Support Vector Machine In each column, the best results have been bolded.
  • 15. Results of the SIC 2004 exercise MAE ME RMSE Ro Participant Method routine joker routine joker routine joker routine joker Timonin NN 9.40 14.85 -1.25 -0.51 12.59 45.46 0.78 0.84 Fournier GEOSTATS 9.06 16.22 -1.32 -8.58 12.43 81.44 0.79 0.27 Pozdnoukhov SVM 9.22 16.25 -0.04 -6.70 12.47 81.00 0.79 0.28 (authors are Saveliev SPLINES 9.60 17.00 3.00 10.40 13.00 82.20 0.77 0.23 highlighted) Dutta Ingram NN GEOSTATS 9.92 9.10 17.50 18.55 0.20 -1.27 5.10 -4.64 13.10 12.46 80.60 54.22 0.76 0.79 0.29 0.86 Hofierka SPLINES 9.10 18.62 -1.30 0.41 12.51 73.68 0.79 0.50 Hofierka SPLINES 9.10 18.62 -1.30 0.41 12.51 73.68 0.79 0.50 Fournier GEOSTATS 9.22 19.43 -0.89 -0.22 12.51 73.50 0.78 0.48 Fournier OTHERS 9.29 19.44 -1.12 -0.12 12.56 71.87 0.78 0.53 Savelieva GEOSTATS 9.11 19.68 -1.39 -2.18 12.49 69.08 0.78 0.56 Palaseanu GEOSTATS 9.05 19.76 1.40 2.33 12.46 74.54 0.79 0.50 Rigol S. NN 12.10 20.30 -1.20 -9.40 15.80 84.10 0.67 0.12 Pebesma GEOSTATS 9.11 20.83 -1.22 0.92 12.44 73.73 0.79 0.50 Pebesma OTHERS 9.94 21.03 -1.35 4.50 13.32 72.12 0.78 0.51 Ingram GEOSTATS 9.08 21.77 -1.44 0.72 12.47 79.57 0.79 0.35 Lophaven GEOSTATS 9.70 22.20 1.20 -4.10 13.10 71.20 0.76 0.54 Saveliev SPLINES 9.30 22.20 1.60 0.60 12.60 76.40 0.78 0.41 Ingram GEOSTATS 9.47 22.53 -1.15 3.09 12.75 79.16 0.78 0.33 Pebesma GEOSTATS 9.11 23.26 -1.22 4.00 12.44 76.19 0.79 0.42 Rigol S. NN 16.00 25.30 -1.70 -11.10 20.80 87.50 0.55 0.02 Hofierka SPLINES 9.38 26.52 -1.27 4.29 12.68 77.98 0.78 0.38 Dutta NN 9.62 28.20 0.90 -0.22 12.70 80.10 0.78 0.31 Pebesma GEOSTATS 9.11 28.45 -1.22 12.01 12.44 81.41 0.79 0.38 Dutta NN 12.20 28.90 1.50 -1.29 15.90 79.90 0.64 0.33 Rigol S. NN 21.40 30.50 5.30 3.80 45.80 96.60 0.24 0.20 Ingram NN 9.72 38.29 -1.54 8.38 13.00 84.24 0.76 0.30 Dutta NN 9.93 38.50 2.18 17.98 13.30 87.30 0.76 0.27 Ingram NN 9.48 48.41 -1.22 -3.01 12.73 90.89 0.78 0.38 Pebesma GEOSTATS 9.11 146.36 -1.22 19.71 12.44 212.10 0.79 -0.27
  • 16. Results of the SIC 2004 exercise MAE ME RMSE Ro Participant Method routine joker routine joker routine joker routine joker Timonin NN 9.40 14.85 -1.25 -0.51 12.59 45.46 0.78 0.84 Fournier GEOSTATS 9.06 16.22 -1.32 -8.58 12.43 81.44 0.79 0.27 Pozdnoukhov SVM 9.22 16.25 -0.04 -6.70 12.47 81.00 0.79 0.28 Saveliev SPLINES 9.60 17.00 3.00 10.40 13.00 82.20 0.77 0.23 Dutta NN 9.92 17.50 0.20 5.10 13.10 80.60 0.76 0.29 Ingram GEOSTATS 9.10 18.55 -1.27 -4.64 12.46 54.22 0.79 0.86 Hofierka SPLINES 9.10 18.62 -1.30 0.41 12.51 73.68 0.79 0.50 Hofierka SPLINES 9.10 18.62 -1.30 0.41 12.51 73.68 0.79 0.50 Fournier GEOSTATS 9.22 19.43 -0.89 -0.22 12.51 73.50 0.78 0.48 Fournier OTHERS 9.29 19.44 -1.12 -0.12 12.56 71.87 0.78 0.53 Savelieva GEOSTATS 9.11 19.68 -1.39 -2.18 12.49 69.08 0.78 0.56 Palaseanu GEOSTATS 9.05 19.76 1.40 2.33 12.46 74.54 0.79 0.50 Rigol S. NN 12.10 20.30 -1.20 -9.40 15.80 84.10 0.67 0.12 Pebesma GEOSTATS 9.11 20.83 -1.22 0.92 12.44 73.73 0.79 0.50 Pebesma OTHERS 9.94 21.03 -1.35 4.50 13.32 72.12 0.78 0.51 Ingram GEOSTATS 9.08 21.77 -1.44 0.72 12.47 79.57 0.79 0.35 Lophaven GEOSTATS 9.70 22.20 1.20 -4.10 13.10 71.20 0.76 0.54 Saveliev SPLINES 9.30 22.20 1.60 0.60 12.60 76.40 0.78 0.41 Ingram GEOSTATS 9.47 22.53 -1.15 3.09 12.75 79.16 0.78 0.33 Pebesma GEOSTATS 9.11 23.26 -1.22 4.00 12.44 76.19 0.79 0.42 Rigol S. NN 16.00 25.30 -1.70 -11.10 20.80 87.50 0.55 0.02 Hofierka SPLINES 9.38 26.52 -1.27 4.29 12.68 77.98 0.78 0.38 Dutta NN 9.62 28.20 0.90 -0.22 12.70 80.10 0.78 0.31 Pebesma GEOSTATS 9.11 28.45 -1.22 12.01 12.44 81.41 0.79 0.38 Dutta NN 12.20 28.90 1.50 -1.29 15.90 79.90 0.64 0.33 Rigol S. NN 21.40 30.50 5.30 3.80 45.80 96.60 0.24 0.20 Ingram NN 9.72 38.29 -1.54 8.38 13.00 84.24 0.76 0.30 Dutta NN 9.93 38.50 2.18 17.98 13.30 87.30 0.76 0.27 Ingram NN 9.48 48.41 -1.22 -3.01 12.73 90.89 0.78 0.38 Pebesma GEOSTATS 9.11 146.36 -1.22 19.71 12.44 212.10 0.79 -0.27
  • 17. Modeling of wind fields with MLP and regularization technique (pp 168-172 of the book) Monitoring network: 111 stations in Switzerland (80 training + 31 for validation) Mapping of daily: • Mean speed • Maximum gust • Average direction
  • 18. Modeling of wind fields with MLP and regularization technique Monitoring network: 111 stations in Switzerland (80 training + 31 for validation) Mapping of daily: • Mean speed • Maximum gust • Average direction Input information: X,Y geographical coordinates DEM (resolution 500 m) 23 DEM-based « geo-features » Total 26 features Model: MLP 26-20-20-3
  • 19. Training of the MLP Model: MLP 26-20-20-3 Training: • Random initialization • 500 iterations of the RPROP algorithm
  • 20. Results: naîve approach
  • 21. Results: Noisy ejection regularization
  • 22. Results: summary Noisy ejection regularization Without regularization (overfitting)
  • 23. Thank you for your attention! Next stop is: June 20 09:00 – 12:00 Room T120 Practical work session using Machine Learning software