Minshan Cui, Saurabh Prasad, Majid Mahroogy, Lori Mann Bruce, James AanstoosGenetic Algorithms and Linear Discriminant Analysis based Dimensionality Reduction for Remotely Sensed Image Analysis
Traditional Approaches(Stepwise Selection, Greedy Search, …)Stepwise LDA (S-LDA),  (DAFE)
A preliminary forward selection and backward rejection is employed to discard less relevant features.
A Linear Discriminant Analysis (LDA) projection is applied on this reduced subset of features to further reduce the dimensionality of the feature space.
Drawbacks
In forward selection, one is unable to reevaluate the features that become irrelevant after adding some other features.
In backward rejection, one is unable to reevaluate the features after they have been discarded.Genetic AlgorithmGenetic algorithms are a class of optimization techniques that search for the global minimum of a fitness function.
This typically involves four steps – evaluation, reproduction, recombination, and mutation.Genetic Algorithm(Select 4 bands out of 10 bands)Repeating this process until one of stopping criteria is metFitness FunctionPopulationRankNext GenerationReproductionFitness ValueMutationCrossover
Genetic Algorithm based Linear Discriminant analysisUsing genetic algorithm with BD or Fisher’s ratio as a fitness function to select most relevant features in a dataset.
Bhattacharyya distance (BD)
Fisher’s ratio
Applying linear discriminant analysis on the selected features to further extract features.Genetic Algorithm based Linear Discriminant analysisGenetic AlgorithmOriginalFeaturesSelectedFeaturesFitness FunctionBhattacharyya Distance orFisher’s RatioLinear Discriminant AnalysisExtractedFeatures
Experimental Hyperspectral DatasetHyperspectral Imagery (HSI)
  Using NASA’s AVIRIS sensor
  145x145 pixels and 220 bands      in the 400 to 2450 nm region    of the visible and infrared    spectrum.  Ground truth    of HSI data   Feature layers Figure 1: A plot of reflectance versus wavelength for eight classes of spectral  signatures from AVIRIS Indian Pines data.
Experimental Hyperspectral Dataset3 days after sprayUntreated Check21 days after sprayCheck0.01 kg ae/ha0.02 kg ae/ha0.03 kg ae/ha0.05 kg ae/ha21 days after spray0.11 kg ae/ha0.43 kg ae/ha0.22 kg ae/ha0.43 kg ae/ha3 days after spray
Experimental Hyperspectral Dataset
Experimental Synthetic Aperture Radar DatasetSynthetic Aperture Radar (SAR)
  From NASA Jet Propulsion    Laboratory’s Unmanned    Aerial Vehicle Synthetic    Aperture Radar (UAVSAR)  Two classes – healthy levees     and levees with landslides on     them   Breached   Levee  Ground truth    of SAR data Table 1: Illustrating some salient characteristics of UAVSAR   Feature layers    via GLCM
ExperimentsHSI and SAR analysis using:
 LDA
Stepwise LDA (S-LDA)
GA-LDA-Fisher 	(Using Fisher’s ratio as a fitness function in GA.)GA-LDA-BD	(Using Bhattacharyya distance as a fitness function in GA.)Performance measures:
Overall recognition accuraciesHSI Data Experimental Results

GA Presentation_ver2.pptx

  • 1.
    Minshan Cui, SaurabhPrasad, Majid Mahroogy, Lori Mann Bruce, James AanstoosGenetic Algorithms and Linear Discriminant Analysis based Dimensionality Reduction for Remotely Sensed Image Analysis
  • 2.
    Traditional Approaches(Stepwise Selection,Greedy Search, …)Stepwise LDA (S-LDA), (DAFE)
  • 3.
    A preliminary forwardselection and backward rejection is employed to discard less relevant features.
  • 4.
    A Linear DiscriminantAnalysis (LDA) projection is applied on this reduced subset of features to further reduce the dimensionality of the feature space.
  • 5.
  • 6.
    In forward selection,one is unable to reevaluate the features that become irrelevant after adding some other features.
  • 7.
    In backward rejection,one is unable to reevaluate the features after they have been discarded.Genetic AlgorithmGenetic algorithms are a class of optimization techniques that search for the global minimum of a fitness function.
  • 8.
    This typically involvesfour steps – evaluation, reproduction, recombination, and mutation.Genetic Algorithm(Select 4 bands out of 10 bands)Repeating this process until one of stopping criteria is metFitness FunctionPopulationRankNext GenerationReproductionFitness ValueMutationCrossover
  • 9.
    Genetic Algorithm basedLinear Discriminant analysisUsing genetic algorithm with BD or Fisher’s ratio as a fitness function to select most relevant features in a dataset.
  • 10.
  • 11.
  • 12.
    Applying linear discriminantanalysis on the selected features to further extract features.Genetic Algorithm based Linear Discriminant analysisGenetic AlgorithmOriginalFeaturesSelectedFeaturesFitness FunctionBhattacharyya Distance orFisher’s RatioLinear Discriminant AnalysisExtractedFeatures
  • 13.
  • 14.
    UsingNASA’s AVIRIS sensor
  • 15.
    145x145pixels and 220 bands in the 400 to 2450 nm region of the visible and infrared spectrum. Ground truth of HSI data Feature layers Figure 1: A plot of reflectance versus wavelength for eight classes of spectral signatures from AVIRIS Indian Pines data.
  • 16.
    Experimental Hyperspectral Dataset3days after sprayUntreated Check21 days after sprayCheck0.01 kg ae/ha0.02 kg ae/ha0.03 kg ae/ha0.05 kg ae/ha21 days after spray0.11 kg ae/ha0.43 kg ae/ha0.22 kg ae/ha0.43 kg ae/ha3 days after spray
  • 17.
  • 18.
    Experimental Synthetic ApertureRadar DatasetSynthetic Aperture Radar (SAR)
  • 19.
    FromNASA Jet Propulsion Laboratory’s Unmanned Aerial Vehicle Synthetic Aperture Radar (UAVSAR) Two classes – healthy levees and levees with landslides on them Breached Levee Ground truth of SAR data Table 1: Illustrating some salient characteristics of UAVSAR Feature layers via GLCM
  • 20.
    ExperimentsHSI and SARanalysis using:
  • 21.
  • 22.
  • 23.
    GA-LDA-Fisher (Using Fisher’sratio as a fitness function in GA.)GA-LDA-BD (Using Bhattacharyya distance as a fitness function in GA.)Performance measures:
  • 24.
    Overall recognition accuraciesHSIData Experimental Results
  • 25.
  • 26.
    ConclusionsGA search isvery effective at selecting the most pertinent features.
  • 27.
    Given a moderatefeature space dimensionality and sufficient training samples, LDA is a good projection based dimensionality reduction strategy.
  • 28.
    As the numberof features increases and the training-sample-size decreases, methods such as GA-LDA can assist by providing a robust intermediate step of pruning away redundant and less useful features. References[1] Ho-Duck Kim, Chang-Hyun Park, Hyun-Chang Yang, Kwee-Bo Sim, “Genetic Algorithm Based Feature Selection Method Development for Pattern Recognition,” in SICE-ICASE, 2006.[2] Chulhee Lee and Daesik Hong, “Feature Extraction Using the Bhattacharyya Distance,” in IEEE International on Systems, Man, and Cybernetics, 1997[3] Tran HuyDat, Cuntai Guan, “Feature selection based on fisher ratio and mutual information analysis for robust brain computer interface,” in IEEE International Conference on Acoustics, Speech and Signal Processing, 2007.[4] R.O. Duda, P.E. Stark, D.G. Stork, Pattern Classification, Wiley Inter-science, October 2000.[5] S. Prasad and L. M. Bruce, “Limitations of Principal Components Analysis for Hyperspectral Target Recognition,” in IEEE Geoscience and Remote Sensing Letters, vol. 5, pp. 625-629, 2008.[6] S. Kumar, J. Ghosh, M.M. Crawford, “Best-bases feature extraction algorithms for classification of hyperspectral data,” in IEEE Transactions on Geoscience and Remote Sensing, Vol. 39, No. 7, pp 1368-1379, July 2001.[7] Nakariyakul, S. ,Casasent, D.P., “Improved forward floating selection algorithm for feature subset selection,” in Proceedings of the 2008 International Conference on Wavelet Analysis and Pattern Recognition, HongKong, 30-31 Aug. 2008.[8] K.S. Tang, K.F. Man, S. Kwong, Q. He, “Genetic algorithms and their applications,” in IEEE Signal Processing Magazine, Vol. 13, Nov 1996.[9] NASA Jet Propulsion Laboratory Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) web page http://aviris.jpl.nasa.gov/[10] Kevin Wheeler, Scott Hensley, Yunling Lou, Tim Miller, Jim Hoffman, "An L-band SAR for repeat pass deformation measurements on a UAV platform", 2004 IEEE Radar Conference, Philadelphia, PA, April 2004. (classifying health levee from landslide in a UAVSAR image).
  • 29.
    Thank YouQuestions -Comments - SuggestionsMinshan Cuiminshan@gri.msstate.edu
  • 30.
    How many elite,crossover and mutate kids will be produced in next generation?Elite count Specifies the number of individuals that are guaranteed to survive to the next generation.Crossover fraction Specifies the fraction of the next generation, other than elite children, that are produced by crossover.Ex) Assume population size =10, elite count = 2 and crossover fraction = 0.8nEliteKids = 2
  • 31.
    nCrossoverKids = round(CrossoverFraction×(10 - nEliteKids)) = round( 0.8 × (10 – 2)) = 6nMutateKids = 10 - nEliteKids - nCrossoverKids = 10 - 2 - 6 = 2
  • 32.
    How many parentsGA need to produce crossover kids and mutate kids?Since 2 parents produce 1 crossover kid and 1 parent produce 1 mutate kid, GA will need:nParents = 2 × nCrossoverKids + nMutateKids = 2 × 6 + 2 = 14.GA willneed 12 parents to produce 6 crossoverkids and 2 parents to produce 2 mutatekids.How to select individuals to be parents?Featurenumber: 1 2 3 4 5 6 7 8 9 10 Fitness Value: 3.68 17.24 21.46 - 9.267.59 7.92 104.22 6.47 13.25 12.22The space of each individual having is proportional to its fitness (or rank). Place 14 equally spaced arrows in this line.Individuals with arrows placed will be selected as parents.Selected parents = 1 1 2 3 4 4 4 5 6 7 8 8 9 10How to produce crossover and mutate kids?First we need to randomize the selected parents. parents = 3 4 4 2 1 5 10 4 1 8 6 9 6 812 parents selected to produce 6 crossover kids. parents = [3 4] [4 2] [1 5] [10 4] [1 8] [6 9]2 parents selected to produce 2 mutate kids. parents = [6] [8]
  • 33.
    CrossoverSingle point TwoParents ( Individual 3 & 4 ) 193.2 19.736 215.74 129.85 55.08 142.92 183.95 11.855 98.849 155.64 103.73 15.757 142.21 95.922 2.9524 95.908 184.02 63.021 179.7 183.07Crossover kid193.219.736 215.74 129.8555.08 95.908 184.02 63.021 179.7 183.07ScatteredRandomly produce binary strings. 1 means change, 0 means reserve. 1 0 0 1 1 0 1 0 0 1Two Parents ( Individual 3 & 4 ) 193.2 19.736 215.74 129.85 55.08 142.92 183.95 11.855 98.849 155.64 103.73 15.757 142.21 95.922 2.9524 95.908 184.02 63.021 179.7 183.07Crossover kid103.73 19.736 215.74 95.922 2.9524 142.92 184.02 11.855 183.07 98.849
  • 34.
    MutatationMutation-gaussianAdds a randomnumber taken from a Gaussian distribution with mean 0 to each entry of the parent vector. Parent 103.17 192.15 51.61 210.8 78.211 188.76 177.62 125.36 198.09 140.78 Mutate kid 169.97 211.92 4.7027 82.935 172.73 23.559 123.35 32.11 158.63 214.26Mutation-uniform First, the algorithm selects a fraction of the vector entries of an individual for mutation, where each entry has a probability Rate of being mutated. In the second step, the algorithm replaces each selected entry by a random number selected uniformly from the range for that entry. Parent 103.17 192.15 51.61 210.8 78.211 188.76 177.62 125.36 198.09 140.78 Mutate kid 103.17 213.45 51.61 210.8 45.231 188.76 177.62 97.56 198.09 140.78
  • 35.
    Next generationnextGeneration = [ eliteKids, crossoverKids, mutateKids ]Stopping criteriaGenerations — Specifies the maximum number of iterations for the genetic algorithm to perform. The default is 100.Time limit — Specifies the maximum time in seconds the genetic algorithm runs before stopping.Fitness limit — The algorithm stops if the best fitness value is less than or equal to the value of Fitness limit.Stall generations — The algorithm stops if the weighted average change in the fitness function value over Stall generations is less than Function tolerance.Stall time limit — The algorithm stops if there is no improvement in the best fitness value for an interval of time in seconds specified by Stall time.Function tolerance — The algorithm runs until the cumulative change in the fitness function value over Stall generations is less than or equal to Function Tolerance.