Molinier - Feature Selection for Tree Species Identification in Very High resolution Satellite Images.ppt
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Molinier - Feature Selection for Tree Species Identification in Very High resolution Satellite Images.ppt

on

  • 614 views

 

Statistics

Views

Total Views
614
Views on SlideShare
588
Embed Views
26

Actions

Likes
1
Downloads
6
Comments
0

1 Embed 26

http://www.grss-ieee.org 26

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • This presentation was supposed to be given on Monday but I could not make it because the flight from London was delayed.
  • Good quality, not many clouds Spruce dominant species
  • 100% pure species plots for training because : mixed plots not good for contextual features (radius of analysis around the tree, overlap) Possible to obtain both plot level error and stand level error Not to mix the datasets Train and test data not measured exactly in the same way, and not by the same operators – one by forest centres (public), one by a private company
  • 1.5m : signature only of the tree 7.5 m : context – neighboring trees Partial mean and std : cutting the tails of the distributions Permutation of the feature vector (all tres / samples) Probe variables are obviously not related or correlated to the target forest variables Are the true variables more relevant than the probes ? If a variable is ranked worse than a probe for a given classification task, it should not be selected
  • Primary interest in tree classes Keep classes proportions
  • ATTENTION : correlation between two features does not mean we can eliminate one from the selection (toy examples from Guyon et al.)
  • Always start from the simplest model – linear Nonlinear but keep it simple
  • Explain the axes Accuracy drops after 82%
  • Thank you for staying until the end

Molinier - Feature Selection for Tree Species Identification in Very High resolution Satellite Images.ppt Presentation Transcript

  • 1. Feature Selection for Tree Species Identification in Very High Resolution Satellite Images Matthieu Molinier and Heikki Astola VTT Technical Research Centre of Finland [email_address] , [email_address] IGARSS 2011 Vancouver, 28.7.2011
  • 2. Introduction
    • NewForest – Renewal of Forest Resource Mapping
    • A 1.5-year study (2009-2010) funded by The Finnish Funding Agency for Technology and Innovation (TEKES), with Finnish Companies (forest) and Research Organizations (VTT and University of Eastern Finland UEF)
    • Study motivation
    • Improve methods for operative forest inventory from remote sensing data
    • Species-wise estimates (e.g. stem volume) not accurate enough (a ccuracy vs. cost)
  • 3. NewForest approach in forest variable estimation Modelling based on satellite image pixel reflectances and contextual features Individual tree crown (ITC) detection and crown width estimation Combining data to predict total amount and size variation by species segmentation estimates Refined, more accurate species-wise estimates
  • 4. Study site
    • Karttula / Kuopio,
    • Central Finland
    • 62.9007 º N
    • 27.2392 º E
    Karttula GeoEye image, 26.6.2009, RGB NIR 10.5 km x 11.5 km, 3% clouds Mixed forest, spruce dominated 25% pine, 45% spruce , 30% deciduous (mainly birch)
  • 5. Optical image data pre-processing
    • Rectification to geographic coordinate system (WGS84, NUTM35)
    • Geo-coding corrected using Digital Elevation Model (Airborne Laser Scanning DEM) : mean corrections 2.65 m, maximum 20 m
    • Calibration to Top Of Atmosphere ( TOA) reflectances using the band-specific calibration coefficients
    • Atmospherical correction into surface reflectances by applying the SMAC4 -radiation transfer code
  • 6. Ground reference data
    • Training data – from 222 field plots
    • 212 field plots within GeoEye image area (2009)
    • 10 additional 0-stem volume plots extracted visually
    • Tree species classification : training data from 20 pure species field plots
    • Testing data – from 178 field plots (mixed species)
    • 178 field plots acquired in 2009, limited spatial distribution (several plots per forest stand)
    • In total : 1164 ground objects mapped (276 pines, 277 spruces, 347 deciduous, 264 non-trees)
    GeoEye image : 10.5 km x 11.5 km
  • 7. Input for feature selection – 35 + 4 features R G B NIR PAN mean intensity within 1.5 m radius around tree candidates ( TC ) SPECTRAL (5) – set A CONTEXTUAL (9) – set B From PAN , 7.5 m radius around TC mean mean / median skewness kurtosis contrast pm1 : mean of brightest pixels ps1 : std of brightest pixels pm2 : mean of darkest pixels ps2 : std of darkest pixels SEGMENT-WISE (21) – set C From PAN , 3 segment sizes : 50 m 2 , 85 m 2 , 125 m 2 mean mean / median skewness kurtosis std : standard deviation pmean : partial mean pstd : partial standard deviation Probe variables random vectors or random permutations of a feature vector probe_gauss1 , probe_gauss2 probe_shuffle1 , probe_shuffle2
  • 8. Class definitions and training scheme WHOLE DATASET (1164 samples) 900 trees, 264 non-trees TESTING (391) MODEL DESIGN (773) 2 / 3 1 / 3 TRAINING (512) VAL (261) 2 / 3 1 / 3 stratified sampling to preserve classes proportions model building ranking Class # Class name 1 pine 2 spruce 3 deciduous 4 shadow 5 open area / sunlit 6 bare ground 7 green vegetation Tree classes Non-tree classes
  • 9. Feature selection preparation (Guyon et al., 2003)
    • Feature normalization to the range [0, 1]
    • Visual screening of scatter plots on the 35 real features : no obvious correlations, very few outlier samples
    • Variable ranking – assessing features one by one with the most simple classifier (single threshold), one(+) vs all(-) . 4 scores :
      • Fisher criteria F , scaled to [0 1]
      • R 2 – Pearson correlation coefficient for a single feature vs +/- labels
      • AUC : Area under ROC curve (Receiver-Operative Curve)
      • sum of previous scores ( FR2AUC )
    • All scores computed for every class, then averaged to rank the variables for all 7 classes and for tree classes only (1,2,3).
    • No single feature outperformed significantly and consistently the others
  • 10. Feature selection and image classification
    • Classification accuracy on validation set VAL (261) as a score
    • Sequential Forward Selection ( SFS ) with three classification methods :
      • Linear Discriminant Analysis ( LDA )
      • Quadatric LDA
      • k-nearest neighbor ( kNN ) classifier, k  [2 9]. Feature selection and choice of k at the same time.
    • Find the best minimal feature subset by a brute-force approach
      • 10 best features from the SFS
      • retrain the best model using all modeling dataset (TRAIN + VAL) and test with the independent TEST set
      • brute force approach tractable in this case with simple classifiers
      • overcome the sub-optimality of SFS
  • 11. 6-10 features is enough Spectral features performed best segment-wise features not suited to mixed species study Overall classification accuracy on tree classes over 80% Probe variables selected more often in the first places with LDA than with kNN : linear classifier too simple. Quadratic LDA was overfitting. kNN, k=5 best overall performance, and lowest difference from training to validation error => lower risk of overfitting
  • 12. Example of tree species classification map pine : 76 % spruce : 76 % deciduous : 88 % non-forest
    • Pan-sharpened GeoEye image extract of 1 km x 1 km
    • Individual tree crown classification with 5-NN classifier trained with pure species training data
    • Non-forest mask generated with
    • k-means clustering + cluster labeling
  • 13. Predicted species-wise stem numbers vs. field plot data Nspruce [stems/ha] Npine [stems/ha] Predicted [stems/ha] Ndecid [stems/ha]
    • Predicted stem number per species plot against test data (178 test plots)
    • Systematic under-estimation of predicted stem number with spruce and deciduous classes
    • Noise partly due the small collecting radius (r = 8 m) of test data, and to geolocation differences between satellite and ground data
    0 500 1000 1500 2000 0 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 True number of spruces/field plot Predicted number of spruces/field plot y=0.98*x + 137.1 y=0.98*x + 137.1 y=0.98*x + 137.1 R 2 = 0.24 y=0.98*x + 137.1 y=0.33*x + 239.8 y=0.56*x + 21.0 R 2 = 0.54 True number of broadleaved/field plot Predicted number of broadleaved/field plot y=0.85*x + 45.0 R 2 = 0.34 True number of pines/field plot Predicted number of pines/field plot 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 0 200 400 600 800 1000 1200 1400 1600 1800 2000
  • 14. Conclusions
    • The methodology could detect individual treetops, identify their species and determine species proportions in mixed forest .
    • Feature ranking and feature selection was performed on a set of 35 features for tree species classification.
    • Several classifiers (model including a feature subset and a classification method ) were built. The best turned out to be 5-NN with a subset of 6 features, mostly spectral . Segment-wise features could be discarded.
    • The tree species proportion accuracy was good (1.4% to 3.5%), but the correlation of stem numbers / species not as good as expected.
    • Future work
    • Model selection with more elaborate classifiers (e.g. SVMs)
    • Embedding feature selection into a cross-validation scheme
    • Improve stem number estimation with adaptive filtering
    • Tree crown width estimation validation with ground data
  • 15. [email_address] [email_address] Thank you