Support vector classification


Published on

  • Be the first to comment

  • Be the first to like this

Support vector classification

  1. 1. Supplementary material Support vector classification A support vector machine is an example of a supervised, multivariate classification method. SVMs are supervised in the sense that they are ‘trained’ to learn about differences between the groups to be classified. The method has previously been applied to neuroimaging data (Fan et al., 2005a; Fan et al., 2005b; Kawasaki et al., 2006; Lao et al., 2004; Mourao-Miranda et al., 2005). The image data do not need to satisfy the assumptions of random Gaussian field theory so that image smoothing is unnecessary. SVMs are related to other multivariate methods such as canonical variate analysis, a method successfully applied to FA-images of patients with Alzheimer’s disease (Teipel et al., 2007). General account of SVMs Here we give a short account of SVM to convey the basics of the method to the non-technical reader. In the context of machine learning, individual MR images are treated as points located in a high dimensional space. Supplementary figure 1 illustrates this in a two-dimensional space for simplicity: the circles and squares cannot be separated by their values along a single dimension. Only a combination of two dimensions allows reliable separation.
  2. 2. Supplementary figure 1: Illustrating the concept of group separation in a high-dimensional space and the concept of a decision boundary. In practical terms, a linear kernel matrix from the normalised FA-images is created. To this end, each scan undergoes a pair wise "dot product" with all other scans. For a given pair of scans, each voxel value in one scan is multiplied with the corresponding voxel value (i.e. the voxel from the same point in the brain) from the other scan. The results from those multiplications (as many as there are voxels in each FA-image) are then summed up and form an element in the kernel matrix. In the matrix, each row or column represents the result of a dot product of one scan with all the other scans. The diagonal is formed by the dot products of each scan with itself. Intuitively, the kernel matrix can be viewed as a similarity measure among subjects belonging to a characterised group. The number of dimensions is normally the number of voxels in an image. The voxels are effectively treated as coordinates of a high dimensional space. There are as many dimensions as there are voxels in the FA-map. Images
  3. 3. are distributed in the space and their location is determined by the intensity value of each voxel. The images do not span the whole high dimensional space, but rather cluster in subspaces that contain images that are very similar. This is one reason why image normalisation into a standard space is an important pre-processing step. Good normalization will tighten this grouping and reduce dimensionality. SVM used for classification into classes is an example of a linear discriminative model. The basic model is a binary classifier, which means it divides the space where the MR images are distributed into two classes by finding a separating hyperplane. In a simple two dimensional space this would be a separating line or boundary (see figure), but in higher dimensional spaces it is called a hyperplane. Fisher’s linear discriminate analysis or linear perceptron can both derive linear discriminant hyperplanes. However, the motivation behind a SVM is called “structural risk minimization”, which aims to find the hyperplane that maximizes the distance between two classes that is generated by training. Intuitively, it is reasonably clear that any optimal separating hyperplane (OSH) in a SVM is mostly defined by data samples that are close to the separating boundary between two classes, i.e., those samples which are most ambiguous. These training samples are called the “support vectors”. Samples which are further away from a separating boundary are distinctively different and hence not used to calculate the OSH. This suggests that adding more samples to a training set may not improve definition of an OSH if the new ones are further from the OSH. After training an SVM, the OSH defines the learned differences between groups (in our case PSCs and controls). At this point, it is important to know how well this separation will generalise, as it is possible that the OSH is specific only to the data used for training. Therefore, a validation step is used to assess the accuracy of the classifier by how well it generalizes for other data. A number of methods are available for this; one such method is leave-one-out cross-validation. This procedure iteratively repeats SVM training by leaving out a single image from the training procedure. After each training step, a prediction is made for
  4. 4. the excluded image, which is compared with the ground truth. By leaving out each of the images in turn, it is possible to determine the accuracy with which the classifier will generalize to new data. It is important to note that each image is never part of both the training and testing set in each given validation procedure. This is further illustrated in the textbooks cited below. In addition to testing if a specific pattern of white matter changes exists we were interested in determining which pattern of voxels is most relevant for classification. During the training process the SVM assigned a specific weight to every image reflecting the importance of that scan in separating our two groups; the weight is zero for non-contributing images. The weights are multiplied by a label vector indicating which group the image belongs to (e.g., –1 for PSC and +1 for controls). Each image is then multiplied by the result of the multiplication of its label and weight. Images from each group are then summed resulting in a value for each voxel indicating how important it is for discrimination. The interested reader is referred to the following textbooks (Bishop, 2006; Vapnik, 1998). Voxel based analysis of T1-weighted data VBM-methods All T1 weighted images were analysed using SPM5 ( Images were segmented into grey matter, white matter and normalised to MNI space using a unified approach developed by Ashburner and colleagues (Ashburner and Friston, 2005).This technique employs prior tissue probability maps (TPMs) for each tissue class that code the probability of each voxel belonging to a given tissue class. The intensity distribution of voxels from each class is modelled as a mixture of Gaussians. After an initial affine normalisation step the TPMs are then warped to fit individual T1 images. Parameters for bias correction, tissue classification and spatial normalisation are iteratively estimated from the same generative model. An additional step, usually referred to as modulation, is included to
  5. 5. compensate for the effect of spatial normalisation. This step involves multiplying the spatially normalised segmented images by their relative volume before and after spatial normalisation (Ashburner and Friston, 2000). After this step, the values of each voxel represent a measure of the local volume of that tissue class. Finally, we smoothed the data using an isotropic Gaussian smoothing kernel of 10 mm (full width at half maximum). This was done to render the data more normally distributed and to account for the inexact nature of the normalisation process. Data between the two groups was compared with two sample t-tests. We display the results (supplementary Fig 2) at an exploratory threshold of p=0.01 (uncorrected). VBM-results: Supplementary Fig. 2. Results when testing for areas with a greater local grey matter volume in controls compared to PSC. Results are overlaid on a single subject’s image in MNI-space. The striatum bilaterally as well as adjacent insular cortex were found to show increased local grey matter volume in PSCs compared to controls. Relatively few cortical areas were found to differ between the groups even with this liberal exploratory threshold of p<0.01. The region indicated by the cross-hairs (x,y,z = -36, -18, 44 in MNI-space; T-score=3.7; p=0.001;
  6. 6. uncorrected p-value; FWE corrected p-value=1.00) did not correlate with subject specific levels of voluntary-guided saccade impairment.
  7. 7. References Ashburner J, Friston KJ. Voxel-based morphometry--the methods. Neuroimage 2000; 11: 805-21. Ashburner J, Friston KJ. Unified segmentation. Neuroimage 2005; 26: 839-51. Bishop C. Pattern recognition and machine learning. New York: Springer, 2006. Burgess C. A tutorial on support of vector machines for pattern recognition. Data Mining and Knowledge Discovery 1998: 121-167. Fan RE, Chen PH, Lin CJ. Working set selection using the second order information for training SVM. Journal of Machine Learning Research 2005a; 6: 1889-1918. Fan Y, Shen D, Davatzikos C. Classification of structural images via high-dimensional image warping, robust feature extraction, and SVM. Med Image Comput Comput Assist Interv Int Conf Med Image Comput Comput Assist Interv 2005b; 8: 1-8. Kawasaki Y, Suzuki M, Kherif F, Takahashi T, Zhou SY, Nakamura K, et al. Multivariate voxel-based morphometry successfully differentiates schizophrenia patients from healthy controls. Neuroimage 2006. Lao Z, Shen D, Xue Z, Karacali B, Resnick SM, Davatzikos C. Morphological classification of brains via high-dimensional shape transformations and machine learning methods. Neuroimage 2004; 21: 46-57. Mourao-Miranda J, Bokde AL, Born C, Hampel H, Stetter M. Classifying brain states and determining the discriminating activation patterns: Support Vector Machine on functional MRI data. Neuroimage 2005; 28: 980-95. Teipel SJ, Stahl R, Dietrich O, Schoenberg SO, Perneczky R, Bokde AL, et al. Multivariate network analysis of fiber tract integrity in Alzheimer's disease. Neuroimage 2007; 34: 985-95. Tipping M. Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research 2001; 1: 211-244.
  8. 8. Vapnik V. Statistical Learning Theory. New York: Wiley Interscience, 1998.