Feature Selection  for Tree Species Identification  in Very High Resolution Satellite Images   Matthieu Molinier and Heikk...
Introduction <ul><li>NewForest –  Renewal of   Forest Resource Mapping </li></ul><ul><li>A 1.5-year study (2009-2010) fund...
NewForest approach in forest variable estimation Modelling based on satellite  image pixel reflectances  and contextual fe...
Study site <ul><li>Karttula / Kuopio, </li></ul><ul><li>Central Finland </li></ul><ul><li>62.9007 º   N </li></ul><ul><li>...
Optical image data pre-processing <ul><li>Rectification  to geographic coordinate system (WGS84, NUTM35) </li></ul><ul><li...
Ground reference data <ul><li>Training data – from 222 field plots </li></ul><ul><li>212 field plots  within GeoEye image ...
Input for feature selection – 35 + 4 features  R   G   B   NIR  PAN mean intensity within  1.5 m  radius around  tree cand...
Class definitions and training scheme  WHOLE DATASET  (1164 samples) 900 trees, 264 non-trees TESTING (391) MODEL DESIGN  ...
Feature selection preparation (Guyon et al., 2003) <ul><li>Feature normalization  to the range [0, 1] </li></ul><ul><li>Vi...
Feature selection and image classification <ul><li>Classification accuracy on  validation set VAL (261)  as a score </li><...
6-10  features is enough Spectral features  performed best segment-wise features not suited to  mixed species  study Overa...
Example of tree species classification map pine : 76 % spruce : 76 % deciduous : 88 % non-forest <ul><li>Pan-sharpened  Ge...
Predicted species-wise stem numbers vs. field plot data Nspruce  [stems/ha] Npine  [stems/ha] Predicted [stems/ha] Ndecid ...
Conclusions <ul><li>The methodology could detect individual treetops, identify their species and determine species proport...
[email_address] [email_address] Thank you
Upcoming SlideShare
Loading in …5
×

Molinier - Feature Selection for Tree Species Identification in Very High resolution Satellite Images.ppt

677 views
557 views

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
677
On SlideShare
0
From Embeds
0
Number of Embeds
28
Actions
Shares
0
Downloads
7
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • This presentation was supposed to be given on Monday but I could not make it because the flight from London was delayed.
  • Good quality, not many clouds Spruce dominant species
  • 100% pure species plots for training because : mixed plots not good for contextual features (radius of analysis around the tree, overlap) Possible to obtain both plot level error and stand level error Not to mix the datasets Train and test data not measured exactly in the same way, and not by the same operators – one by forest centres (public), one by a private company
  • 1.5m : signature only of the tree 7.5 m : context – neighboring trees Partial mean and std : cutting the tails of the distributions Permutation of the feature vector (all tres / samples) Probe variables are obviously not related or correlated to the target forest variables Are the true variables more relevant than the probes ? If a variable is ranked worse than a probe for a given classification task, it should not be selected
  • Primary interest in tree classes Keep classes proportions
  • ATTENTION : correlation between two features does not mean we can eliminate one from the selection (toy examples from Guyon et al.)
  • Always start from the simplest model – linear Nonlinear but keep it simple
  • Explain the axes Accuracy drops after 82%
  • Thank you for staying until the end
  • Molinier - Feature Selection for Tree Species Identification in Very High resolution Satellite Images.ppt

    1. 1. Feature Selection for Tree Species Identification in Very High Resolution Satellite Images Matthieu Molinier and Heikki Astola VTT Technical Research Centre of Finland [email_address] , [email_address] IGARSS 2011 Vancouver, 28.7.2011
    2. 2. Introduction <ul><li>NewForest – Renewal of Forest Resource Mapping </li></ul><ul><li>A 1.5-year study (2009-2010) funded by The Finnish Funding Agency for Technology and Innovation (TEKES), with Finnish Companies (forest) and Research Organizations (VTT and University of Eastern Finland UEF) </li></ul><ul><li>Study motivation </li></ul><ul><li>Improve methods for operative forest inventory from remote sensing data </li></ul><ul><li>Species-wise estimates (e.g. stem volume) not accurate enough (a ccuracy vs. cost) </li></ul>
    3. 3. NewForest approach in forest variable estimation Modelling based on satellite image pixel reflectances and contextual features Individual tree crown (ITC) detection and crown width estimation Combining data to predict total amount and size variation by species segmentation estimates Refined, more accurate species-wise estimates
    4. 4. Study site <ul><li>Karttula / Kuopio, </li></ul><ul><li>Central Finland </li></ul><ul><li>62.9007 º N </li></ul><ul><li>27.2392 º E </li></ul>Karttula GeoEye image, 26.6.2009, RGB NIR 10.5 km x 11.5 km, 3% clouds Mixed forest, spruce dominated 25% pine, 45% spruce , 30% deciduous (mainly birch)
    5. 5. Optical image data pre-processing <ul><li>Rectification to geographic coordinate system (WGS84, NUTM35) </li></ul><ul><li>Geo-coding corrected using Digital Elevation Model (Airborne Laser Scanning DEM) : mean corrections 2.65 m, maximum 20 m </li></ul><ul><li>Calibration to Top Of Atmosphere ( TOA) reflectances using the band-specific calibration coefficients </li></ul><ul><li>Atmospherical correction into surface reflectances by applying the SMAC4 -radiation transfer code </li></ul>
    6. 6. Ground reference data <ul><li>Training data – from 222 field plots </li></ul><ul><li>212 field plots within GeoEye image area (2009) </li></ul><ul><li>10 additional 0-stem volume plots extracted visually </li></ul><ul><li>Tree species classification : training data from 20 pure species field plots </li></ul><ul><li>Testing data – from 178 field plots (mixed species) </li></ul><ul><li>178 field plots acquired in 2009, limited spatial distribution (several plots per forest stand) </li></ul><ul><li>In total : 1164 ground objects mapped (276 pines, 277 spruces, 347 deciduous, 264 non-trees) </li></ul>GeoEye image : 10.5 km x 11.5 km
    7. 7. Input for feature selection – 35 + 4 features R G B NIR PAN mean intensity within 1.5 m radius around tree candidates ( TC ) SPECTRAL (5) – set A CONTEXTUAL (9) – set B From PAN , 7.5 m radius around TC mean mean / median skewness kurtosis contrast pm1 : mean of brightest pixels ps1 : std of brightest pixels pm2 : mean of darkest pixels ps2 : std of darkest pixels SEGMENT-WISE (21) – set C From PAN , 3 segment sizes : 50 m 2 , 85 m 2 , 125 m 2 mean mean / median skewness kurtosis std : standard deviation pmean : partial mean pstd : partial standard deviation Probe variables random vectors or random permutations of a feature vector probe_gauss1 , probe_gauss2 probe_shuffle1 , probe_shuffle2
    8. 8. Class definitions and training scheme WHOLE DATASET (1164 samples) 900 trees, 264 non-trees TESTING (391) MODEL DESIGN (773) 2 / 3 1 / 3 TRAINING (512) VAL (261) 2 / 3 1 / 3 stratified sampling to preserve classes proportions model building ranking Class # Class name 1 pine 2 spruce 3 deciduous 4 shadow 5 open area / sunlit 6 bare ground 7 green vegetation Tree classes Non-tree classes
    9. 9. Feature selection preparation (Guyon et al., 2003) <ul><li>Feature normalization to the range [0, 1] </li></ul><ul><li>Visual screening of scatter plots on the 35 real features : no obvious correlations, very few outlier samples </li></ul><ul><li>Variable ranking – assessing features one by one with the most simple classifier (single threshold), one(+) vs all(-) . 4 scores : </li></ul><ul><ul><li>Fisher criteria F , scaled to [0 1] </li></ul></ul><ul><ul><li>R 2 – Pearson correlation coefficient for a single feature vs +/- labels </li></ul></ul><ul><ul><li>AUC : Area under ROC curve (Receiver-Operative Curve) </li></ul></ul><ul><ul><li>sum of previous scores ( FR2AUC ) </li></ul></ul><ul><li>All scores computed for every class, then averaged to rank the variables for all 7 classes and for tree classes only (1,2,3). </li></ul><ul><li>No single feature outperformed significantly and consistently the others </li></ul>
    10. 10. Feature selection and image classification <ul><li>Classification accuracy on validation set VAL (261) as a score </li></ul><ul><li>Sequential Forward Selection ( SFS ) with three classification methods : </li></ul><ul><ul><li>Linear Discriminant Analysis ( LDA ) </li></ul></ul><ul><ul><li>Quadatric LDA </li></ul></ul><ul><ul><li>k-nearest neighbor ( kNN ) classifier, k  [2 9]. Feature selection and choice of k at the same time. </li></ul></ul><ul><li>Find the best minimal feature subset by a brute-force approach </li></ul><ul><ul><li>10 best features from the SFS </li></ul></ul><ul><ul><li>retrain the best model using all modeling dataset (TRAIN + VAL) and test with the independent TEST set </li></ul></ul><ul><ul><li>brute force approach tractable in this case with simple classifiers </li></ul></ul><ul><ul><li>overcome the sub-optimality of SFS </li></ul></ul>
    11. 11. 6-10 features is enough Spectral features performed best segment-wise features not suited to mixed species study Overall classification accuracy on tree classes over 80% Probe variables selected more often in the first places with LDA than with kNN : linear classifier too simple. Quadratic LDA was overfitting. kNN, k=5 best overall performance, and lowest difference from training to validation error => lower risk of overfitting
    12. 12. Example of tree species classification map pine : 76 % spruce : 76 % deciduous : 88 % non-forest <ul><li>Pan-sharpened GeoEye image extract of 1 km x 1 km </li></ul><ul><li>Individual tree crown classification with 5-NN classifier trained with pure species training data </li></ul><ul><li>Non-forest mask generated with </li></ul><ul><li>k-means clustering + cluster labeling </li></ul>
    13. 13. Predicted species-wise stem numbers vs. field plot data Nspruce [stems/ha] Npine [stems/ha] Predicted [stems/ha] Ndecid [stems/ha] <ul><li>Predicted stem number per species plot against test data (178 test plots) </li></ul><ul><li>Systematic under-estimation of predicted stem number with spruce and deciduous classes </li></ul><ul><li>Noise partly due the small collecting radius (r = 8 m) of test data, and to geolocation differences between satellite and ground data </li></ul>0 500 1000 1500 2000 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 True number of spruces/field plot Predicted number of spruces/field plot y=0.98*x + 137.1 y=0.98*x + 137.1 y=0.98*x + 137.1 R 2 = 0.24 y=0.98*x + 137.1 y=0.33*x + 239.8 y=0.56*x + 21.0 R 2 = 0.54 True number of broadleaved/field plot Predicted number of broadleaved/field plot y=0.85*x + 45.0 R 2 = 0.34 True number of pines/field plot Predicted number of pines/field plot 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 1400 1600 1800 2000
    14. 14. Conclusions <ul><li>The methodology could detect individual treetops, identify their species and determine species proportions in mixed forest . </li></ul><ul><li>Feature ranking and feature selection was performed on a set of 35 features for tree species classification. </li></ul><ul><li>Several classifiers (model including a feature subset and a classification method ) were built. The best turned out to be 5-NN with a subset of 6 features, mostly spectral . Segment-wise features could be discarded. </li></ul><ul><li>The tree species proportion accuracy was good (1.4% to 3.5%), but the correlation of stem numbers / species not as good as expected. </li></ul><ul><li>Future work </li></ul><ul><li>Model selection with more elaborate classifiers (e.g. SVMs) </li></ul><ul><li>Embedding feature selection into a cross-validation scheme </li></ul><ul><li>Improve stem number estimation with adaptive filtering </li></ul><ul><li>Tree crown width estimation validation with ground data </li></ul>
    15. 15. [email_address] [email_address] Thank you

    ×