IGARSS2011-I-Ling.ppt
Upcoming SlideShare
Loading in...5
×
 

IGARSS2011-I-Ling.ppt

on

  • 850 views

 

Statistics

Views

Total Views
850
Views on SlideShare
843
Embed Views
7

Actions

Likes
0
Downloads
11
Comments
0

1 Embed 7

http://www.grss-ieee.org 7

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Chairmen, Ladies and Gentlemen. The title of our study is ”Spatial Information Based Support Vector Machine for Hyperspectral Image Classification.” My name is I-Ling Chen and it’s my honor to present our study here.
  • 有關整篇論文的架構,大致上會依據此 outline 分為幾項要點一一做介紹。 首先,針對本篇研究做一個簡單的介紹。
  • 過去許多研究利用 Support Vector Machines 及 Multiple Classifier System 來減緩 Hughes Phenomenon 所帶來降低辨識效果的影響,都有很不錯的表現。 特別在這些研究中,也同時將 SVM 運用於多重辨識器系統中。 確實,相較於其他基底辨識器而言, SVM 有較為突出的準確度表現, 但卻也面對了挑選適合的 Kernel function 以及需耗費較長時間這兩項重大挑戰。 所謂的 Hughes Phenomenon , 主要是一項在進行高維度資料分類的研究中難以避免的問題。
  • 雖然維度上升能提高資料的 separability ,然而 在訓練樣本數不足的情況下 會使得參數的估計準確度下降,造成不好的辨識效果。此即 Hughes phenomenon ,經常發生在小樣本與高維度資料當中。因此就統計估計而言,進行 高維度資料 分類 將比低維度的 資料 需要高出許多的訓練樣本以便使分析有良好的正確率。但收集數量龐大的訓練樣本常是一困難且昂貴的工作。
  • 因此, SVM 被提出來並且在許多篇 paper 中被應用作為一個有效克服 Hughes phenomenon 的分類器。其中包含了使用 kernel function 的技巧以及訓練分類器的整個過程。
  • 就分類而言, Kernel Method 的主要目標就是希望能將原空間的資料利用一個 kernel function 轉換至特徵空間 中,使得同類的 sample 能分在同一個區域,並且和不同類的 samples 分開。
  • 另外, SVM 主要概念就是可以利用 kernel function 將資料轉換至特徵空間中,並希望能找到一條 Hyperplan 將資料分成兩群
  • 另外談到 Multiple Classifier System 也是一項有效減緩 Hughes phenomenon 並且提升單一辨識器效能的方法。此圖為 Kuncheva 於 2004 年所提出建構 classifier ensembles 的主要架構。在此,我們分別就 Ho 以及楊等人於 1998 和 2010 年所提出的兩個方法,針對中間黃色區塊這個部分來探討利用不同特徵子空間去訓練多個 base classifiers 的方法。
  • Ho 在 1998 年提出一個 RANDOM SUBSPACE METHOD ,簡稱 RSM 在這整個架構中,我們是利用 SVM 做為他的 learning algorithm 來做一個簡單的介紹。 首先決定所需要的 ensemble size, S ( 亦即 SVM 的總數 ) 。 接著藉由 random feature selection 由原 d 維的資料中選取 w 個特徵,進行資料降維來訓練 S 個辨識器,得到一個多重辨識器系統。
  • 然而這個方法存在兩個缺失。 首先,此方法並未說明降維的資料須包含多少 features 即維度數才是好的? 第二、按理說,每個單一特徵對分類而言應該都具有不同的區別能力,而一個隨機選取的演算法並無法區分此項特質。
  • 因此,楊等人在 2010 年即提出一個動態子空間法,以兩個重要性分布來解決前述兩項問題。 第一個是 feature weight 的重要性分布 , 用來 model 各個特徵被選取的機率, 其計算的方法分別利用了 LDA 的分散度與 re-substitution accuracy 來針對每一個維度給予不同的選取機率。 第二個 “ 子空間維度數的重要性分布” , 則是用來自動化選擇適當的子空間大小
  • 就其同樣以 SVM 作為基底辨識器的整個架構而言, DSM 與 RSM 最大的不同就是在特徵選取的步驟當中,加入了兩個重要性分布。 並且利用每次訓練後 SVM 所辨識出的 re-substitution accuracy 回饋給 R distribution 更新分布情形,以利於下次子空間大小的選擇。
  • 然而,在此我們仍然指出 DSM 所需面對的兩項重要挑戰。 首先就是,如同前面所探討過的,由於 SVM 的效果受 kernel   function  的影響相當重大,因此同樣必須面對選取適合 Kernel function 的挑戰。 第二就是必須面對時間上的考驗。 事實上,使用 SVM 的方法本身就需耗費相對多的時間在選取適合的 kernel function 或 kernel 參數;尤其是 DSM 必須利用每個 SVM 的 re-subsititution accuracy 來進行重要性分布的更新,更是相當費時 !
  • 因此,本研究將利用最佳化 kernel function 的方法來改善這些問題。 在控制 SVM 效能的方法中,選取適合的 kernel function 或針對 kernel function 挑選適合的參數,都是相當有效的 optimal kernel 方法。 在此,我們針對後者採用李等人於 2010 年所提出的一個有效方法,來選取適合 RBF kernel 的參數。 此方法利用 RBF kernel 能將所有 samples 投影至一個特徵空間球體上並且長度皆為 1 的特性,
  • 本研究即結合 DSM 中的兩個重要性分布和 Optimal RBF Kernel 參數方法的優勢,提出一套 Kernel-based Dynamic Subspace Method 在此簡稱 KDSM
  • 最後,基於 M 和 W 此兩個重要性分布所得的機率密度函數進行子空間選取的多重辨識器系統訓練過程,將所得的多個 SVM 分類結果做多數決的投票來決定最後判定的類別。
  • 在實驗中,我們共利用五種分類演算法來做比較。 分別為 single SVM 使用 5-fold cross-validation 在此 range 中挑選適合的 sigma 以及利用最佳化演算法求 sigma 的單一 SVM. 另外三種多重辨識器系統的演算方法,則分別為 DSM 使用兩種不同的特徵權重指標以及本研究所提的 KDSM.
  • 所使用的資料有兩個,首先使用 Washington, DC Mall 的部分高光譜影像資料作為我們的實驗資料。 其中共分為七個類別,皆包含 191 維的光譜資訊作為資料的 features.
  • 實驗的結果分別就三種 case 作探討, case1 為各類別的訓練樣本數和總訓練樣本數皆小於維度數的情況, case2 為僅各類別的 training samples 小於維度數, case3 則是無論各類別或總訓練樣本數皆足夠避免估計偏誤的情形。 由整體結果來看,不管在任何 case 下, KDSM 皆可得到較佳的結果。 另外,就三種多重辨識器系統而言,在正確率上確實都能大幅提升單一辨識器的效果,然而比較其 Computer time 的差異, 解釋 cases : In the first one, there are only twenty training samples in every class. In this case, the estimations of statistics are not accurate for every class. In the second one, there are forty training samples in every class. The estimations of statistics may be accurate for the space. However, for the whole dataset, the estimations are not accurate. In the third case, there are enough number of samples in every class and for the space. So the corresponding estimations may be all accurate.
  • 相對於 KDSM 所花的時間, DSM 至少必須多花 12 倍 ( 按一下動畫 ) 甚至是 68 倍的計算時間來達到較理想的效果。
  • 此外,我們直接用 Classification Maps 來分別看使用不同訓練樣本數訓練出的分類器對整張圖的分類效果。 在每類 Training samples 為 20 的分類下, KDSM 分別在 Road 及此區 grass 有不錯的判別力。
  • Ni=40 時, Roof 有較明顯的效果。
  • 當 Ni=300 ,分別在這些區塊中都能看出 KDSM 相較於其他方法更為出色的辨識能力。
  • 最後,針對此篇研究可以做個簡單的總結 . 首先,本論文主要的核心概念就是基於子空間選取的多重辨識器系統, 將選取 RBF 參數的最佳化演算法與動態子空間法相結合,提出一個適合高維度資料進行分類的方法。 由實驗結果可見,此論文所提出的 KDSM 在 Washington DC Mall datasets 的分類結果中,始終保持最佳的表現。 並且在時間上也證實了其效果,達到我們最初所訂的研究目的。

IGARSS2011-I-Ling.ppt IGARSS2011-I-Ling.ppt Presentation Transcript

  • COMBINING ENSEMBLE TECHNIQUE OF SUPPORT VECTOR MACHINES WITH THE OPTIMAL KERNEL METHOD FOR HIGH DIMENSIONAL DATA CLASSIFICATION I-Ling Chen 1 , Bor-Chen Kuo 1 , Chen-Hsuan Li 2 , Chih-Cheng Hung 3 1 Graduate Institute of Educational Measurement and Statistics, National Taichung University, Taichung, Taiwan, R.O.C. 2 Department of Electrical and Control Engineering, National Chiao Tung University, Taiwan, R.O.C. 3 School of Computing and Software Engineering, Southern Polytechnic State University, GA, U.S.A.
      • Introduction
          • Statement of problems
          • The Objective
        • Literature Review
        • Support Vector Machines
            • Kernel method
        • Multiple Classifier System
            • Random subspace method , Dynamic subspace method
          • An Optimal Kernel Method for selecting RBF Kernel Parameter
        • Optimal Kernel-based Dynamic Subspace Method
        • Experimental Design and Results
        • Conclusion and Future Work
    Outline
  • INTRODUCTION
  • or so called curse of dimensionality, peaking phenomenon Small sample size, N High dimensionality, d low performance Hughes Phenomenon (Hughes, 1968)
    • Proposed by Vapnik and Coworkers (1992, 1995, 1996, 1997, 1998)
    • It’s robust and effect to Hughes phenomenon.
    • (Bruzzone & Persello, 2009; Camps-Valls, Gomez-Chova, Munoz-Mari, Vila-Frances, Calpe-Maravilla,2006; Melgani & Bruzzone,2004; Camps-Valls & Bruzzone, 2005; Fauvel, Chanussot, & Benediktsson, 2006)
    • SVM includes
      • Kernel Trick
      • Support Vector Learning
    Support Vector Machines (SVM)
  • The Goal of Kernel Method for Classification
    • The samples in the same class can be mapped into the same area.
    • The samples in the different classes can be mapped into the different areas.
    • SV learning tries to learn a linear separating hyperplane for a two-class classification problem via a given training set.
    Support Vector Learning Illustration of SV learning with kernel trick: nonlinear feature mapping optimal hyperplane support vectors margins support vector
    • There are two effective approaches for generating an ensemble of diverse base classifiers via different feature subsets.
    • ( Ho, T. K. ,1998 ; Yang, J-M., Kuo, B-C., Yu,P-T. & Chuang, C-H. 2010 )
    Kuncheva, L. I. (2004). Combining Pattern Classifiers: Methods and Algorithms. Hoboken, NJ: Wiley & Sons.
    • Approaches to building classifier ensembles.
    Multiple Classifier System
  • THE FRAMEWORK OF RANDOM SUBSPACE METHOD (RSM) BASED ON SVM ( HO, 1998 ) Given the learning algorithm, SVM, and the ensemble size, S .
  • THE INADEQUACIES OF RSM Given the learning algorithm, SVM, and the ensemble size, S . * Irregular Rule Each individual feature potentially possesses the different discriminate power for classification. A randomized strategy for selecting feature is unable to distinguish between informative features and redundant ones. * Implicit Number How to choose a suitable subspace dimensionality for the SVM. Without an appropriate subspace dimensionality for the SVM, RSM might be inferior to a single classifier. random features selection Given w
    • Two importance distributions
      • Importance distribution of feature weight, W distribution to model the selected probability of each feature.
      • Importance distribution of subspace dimensionality, R distribution to automatically determine the suitable subspace size.
    Initialization R 0 Kernel smoothing Class separability of LDA for each feature Re-substitution accuracy for each feature DYNAMIC SUBSPACE METHOD (DSM) ( Yang et al., 2010 ) 1 49 97 145 191 0 1 2 3 4 Feature Density (%) 1 49 97 145 191 0 1 2 Feature Density (%) 1 49 97 145 191 0 1 2 Feature Density (%) 1 49 97 145 191 0 1 2 Feature Density (%) 1 49 97 145 191 0 1 2 Feature Density (%) ML SVM kNN BCC
  • THE FRAMEWORK OF DSM BASED ON SVM Given the learning algorithm, SVM, and the ensemble size, S .
  • INADEQUACIES OF DSM Given the learning algorithm, SVM, and the ensemble size, S . * Kernel function The SVM algorithm provides an effective way to perform supervised classification. However, The kernel function is a critical topic to influence the performance of SVM. * time-consuming Choosing a proper kernel function or a better parameter of kernel for SVM is quite important yet ordinarily time-consuming. Especially, an updating R distribution is obtained by the resubstitution accuracy in DSM.
    • The performances of SVM are based on choosing the proper kernel functions or proper parameters of a kernel function .
    • Li, Lin, Kuo, and Chu (2010) present a novel criterion to choose a proper parameter σ of RBF kernel function automatically.
    Gaussian Radial Basis Function (RBF) kernel :
    • In the feature space determined by the RBF kernel, the norm of every sample is one, and the kernel values are positive. Hence, the samples will be mapped onto the surface of a hypersphere.
    An Optimal Kernel Method for Selecting RBF Kernel Parameter
  • Kernel-based Dynamic Subspace Method (KDSM)
  • THE FRAMEWORK OF KDSM Original Dataset X Separability Feature (Band) Kernel based Feature Selection Distribution M dist Multiple Classifiers Subspace Pool (Reduced Dataset) Decision Fusion (Majority Voting) Kernel based W distribution Kernel Space ( L -dimension) Optimal RBF Kernel Algorithm + Kernel Smoothing Optimal RBF Kernel Algorithm Until the performance of classification is stable
  • OP : the optimal method to choose CV : 5-fold cross-validation We use the grid search within a range [0.01, 10] (suggested by Bruzzone & Persello, 2009) to choose a proper parameter (2σ 2 ) of RBF kernel and a set {0.1, 1, 10, 20, 60, 100, 160, 200, 1000} to choose a proper parameter of slack variable to control the margins. Experiment Design Algorithm Description SVM_CV Without any dimension reduction on only a single SVM with CV method SVM_OP Without any dimension reduction on only a single SVM with OP method DSM_ W ACC DSM with the re-substitution accuracy as the feature weights DSM_ W LDA DSM with the separability of Fisher’s LDA as the feature weights KDSM Kernel-based dynamic subspace method proposed in this research
  • EXPERIMENTAL DATASET Hyperspectral Image data IR Image Image (No. of bands) Washington, DC Mall (dims d =191) # of classes 7 Category (No. of labeled data) Roof (3776) Road (1982) Path (737) Grass (2870) Tree (1430) Water (1156) Shadow (840)
  • Experimental Results
    • There are three cases in Washington, DC Mall. case 1: ; case 2:
    • case 3:
    : the number of training samples in class i : the number of all training samples Method SVM_CV SVM_OP DSM_ W ACC DSM_ W LDA KDSM Case 1 Accuracy (%) 83.66 83.79 85.49 87.47 88.64 CPU Time (sec) 30.35 3.10 6045.31 2188.62 155.31 Case 2 Accuracy (%) 86.39 87.89 88.74 89.43 92.53 CPU Time (sec) 116.02 6.65 21113.75 4883.92 308.26 Case 3 Accuracy (%) 94.69 95.31 95.94 96.94 97.43 CPU Time (sec) 5858.18 376.99 1165048.6 220121.62 17847.7
  • Experiment Results in Washington, DC Mall The outcome of classification by using various multiple classifier systems: Method Case 1 Case 2 Case 3 Accuracy Ratio Accuracy Ratio Accuracy Ratio DSM_W ACC 85.49% 38.924 88.74% 68.493 95.94% 65.277 DSM_W LDA 87.47% 14.092 89.43% 15.844 96.94% 12.333 KDSM 88.64% 1 92.53% 1 97.43% 1
  • Classification Maps with N i =20 in Washington, DC Mall □ Background ■ Water ■ Tree ■ Path ■ Grass ■ Roof ■ Road ■ Shadow SVM_CV SVM_OP DSM_W ACC DSM_W LDA KDSM
  • Classification Maps (roof) with N i =40 □ Background ■ Water ■ Tree ■ Path ■ Grass ■ Roof ■ Road ■ Shadow SVM_CV SVM_OP DSM_W ACC DSM_W LDA KDSM
  • Classification Maps with N i =300 in Washington, DC Mall □ Background ■ Water ■ Tree ■ Path ■ Grass ■ Roof ■ Road ■ Shadow SVM_CV SVM_OP DSM_W ACC DSM_W LDA KDSM
    • In this paper, the core of the presented method, KDSM, is applying both optimal algorithm of selecting the proper RBF parameter and dynamic subspace method in the subspace selection based MCS to improve the result of classification in high dimensional dataset.
    • The experimental results showed that the classification accuracies of KDSM invariably are the best among outcomes of all classifiers in each cases of Washington DC Mall datasets.
    • Moreover, these results show that comparing with DSM, the KDSM can not only obtain more accurate outcome of classification but also economize on computer time.
    Conclusions
  • Thank You