Support vector classification

Supplementary material

Support vector classification

A support vector machine is an example of a supervised, multivariate classification method.

SVMs are supervised in the sense that they are ‘trained’ to learn about differences between

the groups to be classified. The method has previously been applied to neuroimaging data

(Fan et al., 2005a; Fan et al., 2005b; Kawasaki et al., 2006; Lao et al., 2004; Mourao-Miranda

et al., 2005). The image data do not need to satisfy the assumptions of random Gaussian field

theory so that image smoothing is unnecessary.

SVMs are related to other multivariate methods such as canonical variate analysis, a method

successfully applied to FA-images of patients with Alzheimer’s disease (Teipel et al., 2007).

General account of SVMs

Here we give a short account of SVM to convey the basics of the method to the non-technical

reader. In the context of machine learning, individual MR images are treated as points located

in a high dimensional space. Supplementary figure 1 illustrates this in a two-dimensional

space for simplicity: the circles and squares cannot be separated by their values along a single

dimension. Only a combination of two dimensions allows reliable separation.

Supplementary figure 1: Illustrating the concept of group separation in a high-dimensional

space and the concept of a decision boundary.

In practical terms, a linear kernel matrix from the normalised FA-images is created. To this

end, each scan undergoes a pair wise "dot product" with all other scans. For a given pair of

scans, each voxel value in one scan is multiplied with the corresponding voxel value (i.e. the

voxel from the same point in the brain) from the other scan. The results from those

multiplications (as many as there are voxels in each FA-image) are then summed up and form

an element in the kernel matrix. In the matrix, each row or column represents the result of a

dot product of one scan with all the other scans. The diagonal is formed by the dot products of

each scan with itself. Intuitively, the kernel matrix can be viewed as a similarity measure

among subjects belonging to a characterised group. The number of dimensions is normally the

number of voxels in an image. The voxels are effectively treated as coordinates of a high

dimensional space. There are as many dimensions as there are voxels in the FA-map. Images

are distributed in the space and their location is determined by the intensity value of each

voxel. The images do not span the whole high dimensional space, but rather cluster in

subspaces that contain images that are very similar. This is one reason why image

normalisation into a standard space is an important pre-processing step. Good normalization

will tighten this grouping and reduce dimensionality.

SVM used for classification into classes is an example of a linear discriminative model. The

basic model is a binary classifier, which means it divides the space where the MR images are

distributed into two classes by finding a separating hyperplane. In a simple two dimensional

space this would be a separating line or boundary (see figure), but in higher dimensional

spaces it is called a hyperplane. Fisher’s linear discriminate analysis or linear perceptron can

both derive linear discriminant hyperplanes. However, the motivation behind a SVM is called

“structural risk minimization”, which aims to find the hyperplane that maximizes the distance

between two classes that is generated by training. Intuitively, it is reasonably clear that any

optimal separating hyperplane (OSH) in a SVM is mostly defined by data samples that are

close to the separating boundary between two classes, i.e., those samples which are most

ambiguous. These training samples are called the “support vectors”. Samples which are

further away from a separating boundary are distinctively different and hence not used to

calculate the OSH. This suggests that adding more samples to a training set may not improve

definition of an OSH if the new ones are further from the OSH.

After training an SVM, the OSH defines the learned differences between groups (in our case

PSCs and controls). At this point, it is important to know how well this separation will

generalise, as it is possible that the OSH is specific only to the data used for training.

Therefore, a validation step is used to assess the accuracy of the classifier by how well it

generalizes for other data. A number of methods are available for this; one such method is

leave-one-out cross-validation. This procedure iteratively repeats SVM training by leaving out

a single image from the training procedure. After each training step, a prediction is made for

the excluded image, which is compared with the ground truth. By leaving out each of the

images in turn, it is possible to determine the accuracy with which the classifier will

generalize to new data. It is important to note that each image is never part of both the training

and testing set in each given validation procedure. This is further illustrated in the textbooks

cited below.

In addition to testing if a specific pattern of white matter changes exists we were interested in

determining which pattern of voxels is most relevant for classification. During the training

process the SVM assigned a specific weight to every image reflecting the importance of that

scan in separating our two groups; the weight is zero for non-contributing images. The

weights are multiplied by a label vector indicating which group the image belongs to (e.g., –1

for PSC and +1 for controls). Each image is then multiplied by the result of the multiplication

of its label and weight. Images from each group are then summed resulting in a value for each

voxel indicating how important it is for discrimination.

The interested reader is referred to the following textbooks (Bishop, 2006; Vapnik, 1998).

Voxel based analysis of T1-weighted data

VBM-methods

All T1 weighted images were analysed using SPM5 (www.fil.ion.ucl.ac.uk/spm/). Images

were segmented into grey matter, white matter and normalised to MNI space using a unified

approach developed by Ashburner and colleagues (Ashburner and Friston, 2005).This

technique employs prior tissue probability maps (TPMs) for each tissue class that code the

probability of each voxel belonging to a given tissue class. The intensity distribution of voxels

from each class is modelled as a mixture of Gaussians. After an initial affine normalisation

step the TPMs are then warped to fit individual T1 images. Parameters for bias correction,

tissue classification and spatial normalisation are iteratively estimated from the same

generative model. An additional step, usually referred to as modulation, is included to

compensate for the effect of spatial normalisation. This step involves multiplying the spatially

normalised segmented images by their relative volume before and after spatial normalisation

(Ashburner and Friston, 2000). After this step, the values of each voxel represent a measure of

the local volume of that tissue class. Finally, we smoothed the data using an isotropic

Gaussian smoothing kernel of 10 mm (full width at half maximum). This was done to render

the data more normally distributed and to account for the inexact nature of the normalisation

process. Data between the two groups was compared with two sample t-tests. We display the

results (supplementary Fig 2) at an exploratory threshold of p=0.01 (uncorrected).

VBM-results:

Supplementary Fig. 2. Results when testing for areas with a greater local grey matter volume

in controls compared to PSC. Results are overlaid on a single subject’s image in MNI-space.

The striatum bilaterally as well as adjacent insular cortex were found to show increased local

grey matter volume in PSCs compared to controls. Relatively few cortical areas were found to

differ between the groups even with this liberal exploratory threshold of p<0.01. The region

indicated by the cross-hairs (x,y,z = -36, -18, 44 in MNI-space; T-score=3.7; p=0.001;

uncorrected p-value; FWE corrected p-value=1.00) did not correlate with subject specific

levels of voluntary-guided saccade impairment.

References

Ashburner J, Friston KJ. Voxel-based morphometry--the methods. Neuroimage 2000; 11:

805-21.

Ashburner J, Friston KJ. Unified segmentation. Neuroimage 2005; 26: 839-51.

Bishop C. Pattern recognition and machine learning. New York: Springer, 2006.

Burgess C. A tutorial on support of vector machines for pattern recognition. Data Mining and

Knowledge Discovery 1998: 121-167.

Fan RE, Chen PH, Lin CJ. Working set selection using the second order information for

training SVM. Journal of Machine Learning Research 2005a; 6: 1889-1918.

Fan Y, Shen D, Davatzikos C. Classification of structural images via high-dimensional image

warping, robust feature extraction, and SVM. Med Image Comput Comput Assist

Interv Int Conf Med Image Comput Comput Assist Interv 2005b; 8: 1-8.

Kawasaki Y, Suzuki M, Kherif F, Takahashi T, Zhou SY, Nakamura K, et al. Multivariate

voxel-based morphometry successfully differentiates schizophrenia patients from

healthy controls. Neuroimage 2006.

Lao Z, Shen D, Xue Z, Karacali B, Resnick SM, Davatzikos C. Morphological classification

of brains via high-dimensional shape transformations and machine learning methods.

Neuroimage 2004; 21: 46-57.

Mourao-Miranda J, Bokde AL, Born C, Hampel H, Stetter M. Classifying brain states and

determining the discriminating activation patterns: Support Vector Machine on

functional MRI data. Neuroimage 2005; 28: 980-95.

Teipel SJ, Stahl R, Dietrich O, Schoenberg SO, Perneczky R, Bokde AL, et al. Multivariate

network analysis of fiber tract integrity in Alzheimer's disease. Neuroimage 2007; 34:

985-95.

Tipping M. Sparse Bayesian learning and the relevance vector machine. Journal of Machine

Learning Research 2001; 1: 211-244.

Vapnik V. Statistical Learning Theory. New York: Wiley Interscience, 1998.

Support vector classification

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Support vector classification

Similar to Support vector classification (20)

More from butest

More from butest (20)

Support vector classification