Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov – National Centre for Geocomputation, National University of Ireland , Maynooth (Ireland)

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Kernel based models for geo- and environmental sciences- Alexei Pozdnoukhov – National Centre for Geocomputation, National University of Ireland , Maynooth (Ireland) - Presentation Transcript

    1. Kernel Methods (Support Vector Machines) for Environmental and Geo- Sciences Alexei Pozdnoukhov Lecturer National Centre for Geocomputation National University of Ireland, Maynooth +353 (0)1 7086146 Alexei.Pozdnoukhov@nuim.ie
    2. Machine Learning
    3. Learning From Data • Environmental monitoring Current rate of data acquisition is about 0.5Tb/day (increasing at 82% per year) • Remote Sensing Data NASA holds more than 10Pb of data, increasing by 10x every 5 years. ESA data stream is about 0.5Tb/year, likely to increase by 20x in next 5 years. • GIS, DEM • Sensor Networks • Field Measurements
    4. Clustering Cluster 1 Cluster 2
    5. Dimensionality Reduction
    6. Classification Binary Multi-Class
    7. Regression y Input, x
    8. Curse of Dimensionality Sensor Network Sensor Network Need more data? Batteries Recharged at WSN Human activity Remote Sensing Wireless Sensor Network Geographical Information
    9. Detecting Events Observed environment: Events: Very Rare, Extreme high-dimensional input space • High-dimensional spaces: risk of overfitting • Robust to noise in both inputs/outputs • Non-linear and non-parametric • Computationally effective for real-time processing and LBS dissemination
    10. Curse of Dimensionality
    11. Statistical Learning Theory • Models that can generalise from data • Good predictive abilities • Complexity can be controlled
    12. Statistical Learning Theory • Occam’s Razor Principle (14th century) One should not increase, beyond what is necessary, the number of entities required to explain anything • When many solutions are available for a given problem, we should select the simplest one. • But what do we mean by simple? • We will use prior knowledge of the problem to solve to define what is a simple solution (example of a prior: smoothness).
    13. Occam’s Razor and Classification Model 1 Model 2 Model 3 Complexity √√ √ ×× Training error ×× √ √√ Overall - √ -
    14. Structural Risk Minimization • Define a set of learning functions, {S} • Order it in terms of complexity, {S1, …, SN} • Select the optimal S* F = {f(x,α), α∈Λ}
    15. Classification Support Vector Machine SVM
    16. Separating Hyperplane x - input patterns w - weight vector b - threshold f w,b ( x ) = sign ( w ⋅ x + b) How powerful are linear decision functions?
    17. VC-dimension in classification Shattering • the number of samples which can be discriminated by the function for all possible class memberships – shattered. 3 samples: x x x 4 samples: x ? x VC-dimension h of the linear decision functions in RN equals N+1 That is, the power of linear decision functions is beyond our control…?
    18. Support Vector Machine Decision function is a margin hyperplane(*)  1, (w⋅ x) − b ≥ 1 f (x,{w, b}) =   −1, (w⋅ x) − b ≤ −1 Intuition: Large Margin is good. Lemma: Given that the N-dimensional data {xl, x2, …xL} lie inside a finite enclosing sphere of the radius R, the VC-dimension h of the margin-based decision functions (*) follows the inequality: h ≤ min R2 w , N  +1 2   The complexity (VC-dimension) can be controlled with ||w||2 !!
    19. Separating Hyperplane: Max Margin To maximize the margin ρ, one would like to minimize ||w||, or ||w||2.  1, (w ⋅ x) − b ≥ 1 fw,b ( x) =  f w,b ( x) = sign (( w ⋅ x) + b)  −1, (w ⋅ x) − b ≤ −1
    20. Optimization Problem, Lagrangian { 1 2 min w 2 ⇒ yi ( w ⋅ xi + b) ≥ 1, i = 1,..., L. { L w − ∑ α i ( yi ( w ⋅ xi + b) − 1) 2 Lp = 1 2 i =1 L ⇒ ∑α ⋅ y i =1 i i = 0, L w = ∑ αi ⋅ yi ⋅ xi i =1 KKT conditions: αi > 0 - Support Vectors α i ( yi ( w ⋅ xi + b) − 1) = 0, ∀i αi = 0
    21. Optimization Problem: Dual Variables. L L LD = ∑ α i − 1 ∑ α iα j yi y j ( xi ⋅ x j ) 2 i =1 i , j =1 L ∑α y i =1 i i =0 α i ≥ 0, i = 1,...L  L  f ( x ) = sign ( w ⋅ x + b) = sign  ∑ α i yi ( x ⋅ xi ) + b   i =1  • inputs are presented as dot products • Quadratic Programming • convex problem, nice theoretical field • unique solution, good solvers
    22. Soft margin hyperplane: allowing for the training error. error { L w + C ∑ξi 1 2 min 2 i =1 yi ( w ⋅ xi + b) ≥ 1 − ξ i , i = 1,..., L. ξ i ≥ 0, i = 1,...L { L L LD = ∑ α i − 1 2 ∑α α i j yi y j ( xi ⋅ x j ) i =1 i , j =1 C - regularization parameter L ∑α y i i =0 trade-off between i =1 margin maximization 0 ≤ αi ≤ C , i = 1,...L & training error
    23. Support Vector Terminology αi = 0 Normal Samples 0 < αi < C Support Vectors αi = C Support Vectors untypical or noisy C - regularization parameter  L  f ( x ) = sign  ∑ α i yi ( x ⋅ xi ) + b   i =1  trade-off between margin maximization & training error
    24. Support Vector Algorithm Kernel Trick If data is not linearly separable, it can be projected into (sufficiently) high dimensional space. There it is much easier to separate! Example. K ( x, x′) = ( x ⋅ x′) 2  x12   x1     x  →  2 x1 x2   2   x2 2     x → Φ( x) ? The algorithm was formulated in terms of dot products! x ⋅ x′ → Φ ( x ) ⋅ Φ ( x′) ⇔ x ⋅ x′ → K ( x, x′) •K is symmetric •K is positive-definite
    25. Nonlinear SVM. Kernel trick. f ( x ) = wx + b → L f ( x ) = ∑ yiα i K ( x, xi ) + b i =1 Any linear algorithm, formulated in terms of dot products of input data, can be modified into a non-linear one using the kernel trick. trick • Support Vector Machine • Kernel Ridge Regression • Kernel Principle Component Analysis • Kernel Fischer Discriminant Analysis • etc.
    26. Nonlinear SVM. Kernel types. • Polynomial kernel: K ( x, y ) = ( x ⋅ y + 1) p 2 x− y − • Radial Basis Function kernel: K ( x, y ) = e 2σ 2 f ( x ) = sign ( ∑ yiαi K ( x, xi ) + b) i∈SV
    27. Nonlinear SVM. Optimization problem. L L LD = ∑ α i − 1 ∑ α iα j yi y j K ( xi , x j ) 2 i =1 i , j =1 L ∑α y i =1 i i =0 0 ≤ α i ≤ C , i = 1,...L L b = yi − ∑ yiα i K ( xi , x j ) f ( x ) = sign( ∑ yiαi K ( x, xi ) + b) i =0 i∈SV K is positive-definite, still QP programming, hence unique solution!
    28. Support Vector Machine http://www.geokernels.org/teaching/svm
    29. SVM: Software.
    30. Examples
    31. SV Porosity Mapping Data description 200 training samples “+” 94 validation samples minimum = 0.0 median = 0.515 max = 1.000 mean = 0.53 variance = 0.048 The original continuous data were transformed into 2-class data according to the 0.5 threshold: If fpor ≥ 0.5, then y = +1 If fpor < 0.5, then y = -1
    32. SV Porosity Mapping Data: 2-class transformation • class “+1”, ≥ 0.5 o class “-1”, < 0.5 + validation data
    33. SV Porosity Mapping Data loading 150 training samples 50 testing samples Prediction Grid
    34. SV Porosity Mapping Hyper-parameters tuning 2 x − x′ • Gaussian RBF kernel is selected. − K ( x, x′) = e 2σ 2 • Two hyper-parameters: C and σ. • Grid search: testing error analysis for every pair of paramaters. The range of σ min(σ) - minimum distance between data samples max(σ) - max distance between data samples The range of log(C) min(C) - some small value, 1 or less max(C) – depends on data, 1e3-1e6 Start calculation using testing data Save results to file
    35. SV Porosity Mapping Hyper-parameters tuning Training error surface Log(C) Gaussian RBF kernel bandwidth • increase with kernel bandwidth • decrease with C
    36. SV Porosity Mapping Hyper-parameters tuning Testing error surface Log(C) Gaussian RBF kernel bandwidth Complex structure, but generally, if the range is selected reasonably and data splitting is correct, there exist a region of minima – optimal values.
    37. SV Porosity Mapping Hyper-parameters tuning Normalized number of Support Vectors Log(C) Gaussian RBF kernel bandwidth Represents the complexity of the model, the more complex one has more SVs.
    38. What are the parameters for the final model? Hyper-parameters selection Testing error C=3 σ = 0.09 Training error Normalized NSV
    39. What are the parameters for the final model? Hyper-parameters selection Testing error C = 18 σ = 0.13 Training error Normalized NSV
    40. SV Porosity Mapping Dependence on Parameters C = 10 σ 0.02 0.06 0.1 0.2 0.3 0.4 0.5
    41. SV Porosity Mapping Dependence on Parameters C=100 C=10 C=1 C=0.1 σ = 0.1
    42. SV Porosity Mapping Predictive Mapping and Support Vectors Predictive mapping + MARGIN + Normal SV, 0<α<C. + Critical SV, α=C.
    43. Applications for Natural Hazards • Topo-climatic mapping • Landslides • Snow avalanches prediction
    44. Weather observations • 110 meteo stations • Measurements, up to every 10min • Altitude: 270m-3580m • Temperature • Precipitation • Humidity • Air Pressure • Wind Speed • Insolation • Etc. Spatio-temporal prediction mapping?
    45. Temperature Inversion Can only be explained using terrain surface characteristics (convexity, slope, etc.)
    46. Physical Models at local scales • Terrain roughness is too high for physical models, computational speed, precision, uncertainty estimation… PDE on smoothed terrain + empirical correction vModel ( x, y ) = vPhysical + cRidges + cCanyons + cValues + cFlatAreas + cSea ... Can this information be extracted directly from data?
    47. Modelling Scheme Data Predictive Modeling with Machine Learning DEM Non-linear dependencies Noise, Outliers Spatio-Temporal Mapping F E A T U …. R Feature E Selection/Extraction S Analysis Decision Support
    48. Temperature vs. Elevation Mean Monthly Mean Daily Linear Locally Linear Regionalized Mean Hourly Mean Hourly Explained Non-linear Regionalized Temperature Inversion
    49. DEM Features Large Scale Difference of Gaussians Short Scale Difference of Gaussians Slope Local Variance
    50. Temperature Inversion Mapping Probability of Inversion Temperature
    51. Visual Validation
    52. Operational setting http://www.geokernels.org/services/meteo
    53. Applications • Topo-climatic mapping • Landslides • Snow avalanches prediction • Remote Sensing
    54. Landslide inventory SFI (SRC-ID 07/SRC/I1168)
    55. Method I Factor 1 Probability density estimation Factor 2 SFI (SRC-ID 07/SRC/I1168)
    56. Model vs. Training Data SFI (SRC-ID 07/SRC/I1168)
    57. What is wrong with this susceptibility map? SFI (SRC-ID 07/SRC/I1168)
    58. Method II Classification Stable Factor 1 Unstable Factor 2 SFI (SRC-ID 07/SRC/I1168)
    59. Predictive models SFI (SRC-ID 07/SRC/I1168)
    60. A model should fit the observed landslides, and … SFI (SRC-ID 07/SRC/I1168)
    61. Applications • Topo-climatic mapping • Landslides • Snow avalanches prediction • Remote Sensing
    62. Lochaber, Scotland • 1842 days of weather conditions (11 features) recording, 1991-2007 • 1135 days with documented avalanche events • 797 safe days, 245 with avalanches • 260 days unknown (mainly bad weather)
    63. Spatial Data Training data: 722 events, winters 1991-2005 Validation data: 72 events, winters 2006-2007 • 47 avalanche paths, x, y, z, slope, aspect, date • DEM, 10m resolution, 5km x 5km
    64. Lochaber weather observations • Snow index 0-10 • No-settle cumulative Snow over a season • Rain at 900m binary [0, 1] • Snow drift binary [0, 1] • Air temperature -10,… +10 • Wind speed 0, … 25 m/s • Wind Direction 0o-360o • Cloudness [25, 50, 75, 100] • Foot penetration 0, … 50 • Snow temperature 0, … -10 • Insolation cumulative over season
    65. Classification Problem Z Slope Aspect: SN-WE [Spatialized Weather Features] +1 720 …over all the documented avalanche events… Z Slope Aspect: SN-WE [Spatialized Weather Features] +1 Z Slope Aspect: SN-WE [Spatialized Weather Features] -1 44000 …over all the 47 gullies for documented days without avalanches… Z Slope Aspect: SN-WE [Spatialized Weather Features] -1 4 + 22 = 26
    66. Wind Speed and Direction Wind speed weighting: Correction for slope: Correction for curvature: Terrain-corrected wind direction:
    67. Snow accumulation Simple heuristics based on wind speed gradients If Snow index > 0 If Snow drift = 1 Snow accumulation = F(Wind Speed, Wind Direction)
    68. Results DEM Avalanche Danger
    69. Results wind Animation in 3D
    70. Applications • Topo-climatic mapping • Landslides • Snow avalanches prediction • Remote Sensing
    71. Inhabited areas Ground truth is known: population census Testing Training
    72. Inhabited areas Ground truth is known: population census
    73. Inhabited areas: examples
    74. Inhabited areas: examples
    75. Inhabited areas: examples
    76. Inhabited areas: examples
    77. Inhabited areas: examples
    78. Inhabited areas: examples
    79. Pre-processing and Features Mathematical morphology (image closing)
    80. Pre-processing and Features SIFT
    81. Pre-processing and Features Gaussian Mixture Model
    82. Pre-processing and Features
    83. Testing: inhabited areas
    84. Inhabited areas
    85. Inhabited areas
    86. Summary and Conclusions • Statistical Learning Theory • Classification Problem • Support Vector Machines and Kernel Methods • GeoSpatial Data Classification with SVM
    87. Open PhD positions at NCG Thank you! Alexei Pozdnoukhov Alexei.Pozdnoukhov@nuim.ie SFI (SRC-ID 07/SRC/I1168)

    + Beniamino  MurganteBeniamino Murgante, 4 months ago

    custom

    332 views, 0 favs, 0 embeds more stats

    Kernel based models for geo- and environmental scie more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 332
      • 332 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 12
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories