INTEGRATION OF SUPPORT VECTOR MACHINES AND MARKOV RANDOM FIELDS FOR REMOTE SENSING IMAGE CLASSIFICATION Paolo Irrera, Gabriele Moser, Sebastiano B. Serpico University of Genoa, Dept. of Biophysical and Electronic Eng. (DIBE), Via Opera Pia 11a, I-16145 Genoa Italy
Remote sensing image classification
Objective of the paper
Support vector machines
Markov random fields
Markovian proposed method
Architecture of the method
REMOTE SENSING IMAGE CLASSIFICATION
Techniques that aim at labeling each image pixel as belonging to a thematic class.
Examples of applications:
land-use or land-cover mapping;
Many approaches have been proposed for supervised classification:
parametric and nonparametric Bayesian;
support vector machines (SVMs),
OBJECTIVE OF THE PAPER
Key-idea of SVMs:
identifying an optimal linear discriminant hypersurface in a suitable nonlinearly trasformed feature space.
Good analytical properties (generalization capability) and excellent performance in many applications (e.g., object recognition, hyperspectral image classification).
SVMs focus on i.i.d (indipendent and identically distribuited) samples;
in image classification, this implies an intrinsically noncontextual approach.
Objective of the paper:
integration of the SVM and Markov random field (MRF) approaches to classification, aiming at a rigorous contextual generalization of SVMs.
It exploits the information associated to the samples located at the interface between distinct classes (support vectors).
Training is expressed as a quadratic programming problem.
The nonlinear transformation of the feature space is implicitly defined by a kernel function K(x,y), that allows a nonlinear problem to be formalized as a linear problem without a relevant increase in computational complexity.
Here, we use a gaussian kernel.
Quadratic programming problem Discriminant function, nonlinear case, two classes
MARKOV RANDOM FIELDS
MRFs constitute a general family of stochastic models for the contextual information associated with an image, in Bayesian image-analysis problems.
They allow global stochastic models to be formalized according to the local statistical relationships among neighboring pixels (Hammersley-Clifford’s theorem) .
When modeling the random field of the thematic class labels as an MRF, the “maximum a-posteriori ” criterion can be formalized as the minimization of a suitable energy function :
INTEGRATING MRF AND SVM
Here, we prove that, under proper assumptions, the Markovian minimum-energy decision rule can be reformulated as the application of a SVM discriminant function in a transformed feature space, associated to a suitable “contextual kernel” .
Contextual information is formalized through an additional feature (“stacked vector”)
A modified kernel function fuses contextual and noncontextual information (the linear combination of two related contributions).
In this framework, a novel classifier is introduced by using the “iterated conditional mode” approach.
Discriminant function. Contextual kernel Kernel-based expression of the discriminant function
PROPOSED CLASSIFIER I = image n channels to be classified. T = training map. m = update classification map at each iteration.
The method presents the following parameters:
SVM regularization parameter C;
variance of the Gaussian kernel;
weight parameter λ of the spatial kernel contribution.
Algorithms used for parameter estimation: Powell, Ho-Kashyap.
Powell’s algorithm is a local unconstrained minimization method for multidimensional spaces. It does not involves derivatives and is applied here to the cross-validation error (nondifferentiable function) to optimize C and the variance of the Gaussian kernel.
For the estimation of λ a recently proposed approach, based on the Ho-Kashyap’s algorithm for the optimization of weight parameters in MRF models, has been used .
DATA SETS FOR EXPERIMENTS
Data set “Pavia”
Rural area (near Pavia)
700 x 280 pixels
4 channels (XSAR channel is shown in the figure)
Medium resolution (25m)
Main classes: “dry soil” and “wet soil”.
Data set “Tanaro”
Flood of the Tanaro River near Alessandria
3155 x 1695 pixels
Very high resolution (1m)
Main classes : “dry soil” and “water or flooded soil”.
Spatially disjoint training and test fields are available for both data sets.
EXPERIMENTAL RESULTS CONFUSION MATRICES AND ACCURACIES Pavia. Confusion matrix, noncontextual SVM. Pavia. Confusion matrix, proposed method . Tanaro. Confusion matrix, noncontextual SVM . Tanaro. Confusion matrix, proposed method .
EXPERIMENTAL RESULTS CLASSIFICATION MAPS Pavia: map generated by a noncontextual SVM. Pavia: map generated by the proposed method. Tanaro: map generated by a noncontextual SVM. Tanaro : map generated by the proposed method.
EXPERIMENTAL RESULTS CONVERGENCE OF THE METHOD Tanaro: behavior of the accuracy ( overall accuracy – OA, average accuracy – AA, and crossvalidation accuracy – XVAL) as a function of the number of iterations of the proposed method.
A feasible Markovian extension of SVM to contextual classification has been introduced.
Experiments with real data suggest that the proposed method allows a significant accuracy increase to be obtained, as compared to a standard (noncontextual) SVM.
Very accurate results on different types of remote-sensing data, including very high resolution COSMO/SkyMed SAR data.
Possible future extensions:
theoretical analysis of convergence properties (even though no experimental evidence was collected about possibly critical convergence issues);
testing the method with other typologies of remote-sensing data (in particular, optical and hyperspectral images) and with more sophisticated MRF models.
 J. Besag. Spatial interaction and statistical analysis of lattice systems. Journal of the Royal Statistical Society, (6):192–236, 1974.
 R. Brent. Algorithm for minimization without derivatives, chapter 5. Englewood Cliffs, NJ: Prentice-Hall, 1973.
 C. J. Burges. A tutorial on support vector machines for pattern recognition. Research report, Kluwer Academic Publishers, 1998.
 N. Cristianini and J. Shawe-Taylor. An Introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, 2000.
 M. Datcu, K. Seidel, and M. Walessa. Spatial information retrieva from remote sensing images: Information theoretical perspective. IEEE Trans. Geosci. Remote Sensing, 36(5):1431–1445, 1998.
 R. Dubes and A. Jain. Random fields models in image analysis. J. Appl. Stat., 16(2):131–163, 1989.
 R. O. Duda, P. E. Hart, and D. G. Stork. Pattern classification. Wiley Interscience, 2001.
 S. Geman and D. Geman. Sochastic relaxation Gibbs distributions and the bayesian restoration. IEEE Trans. Pattern Anal. Mach. Intell., 6):721–741, 1984.
 D. A. Landgrebe. Signal theory methods in multispectral remote sensing. Wiley-InterScience, 2003.
 F. Melgani and S. B. Serpico. A Markov random field approach to spatio-temporal contextual image classification. IEEE Trans. Geosci. Remote Sensing, 41(11):2478–2487, 2003.
 G. Moser. Analisi di immagini telerilevate per osservazione della Terra, pages 7–48 and 140–197. ECIG, 2006.
 C. Oliver and S. Quegan. Understanding synthetic aperture radar images. SciTech Publishing, 2004.
 W. K. Pratt. Digital image processing. Wiley Interscience, 2007.
 W. Press, S. Teukolsky, W. Vetterling, and B. Flannery. Numerical recipes in C, pages 394–455. Cambridge University Press, New York, NY, U.S.A., 1992.
 J. Richards and X. Jia. Remote sensing digital image analysis. Springer, 2005.
 S. B. Serpico and G. Moser. Weight parameter optimization by the Ho-Kashyap algorithm in MRF model for supervised image classification. IEEE Trans. Geosci. Remote Sensing, 44(12):3695–3705, 2006.
 A. H. S. Solberg. Flexible nonlinear contextual classification. Pattern Recognit. Lett., 25(13):1501–1508, 2004.
 V. N. Vapnik. Statistical learning theory. Wiley Interscience, 1998.