Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this document? Why not share!

Like this? Share it with your network

Share

Pria 2007

on

  • 229 views

 

Statistics

Views

Total Views
229
Views on SlideShare
228
Embed Views
1

Actions

Likes
0
Downloads
2
Comments
0

1 Embed 1

http://www.slideshare.net 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Pria 2007 Document Transcript

  • 1. A DYNAMIC FORMULATION OF THE PATTERN RECOGNITION PROBLEM M.B. Shavlovsky1, O.V. Krasotkina2, V.V. Mottl3 1 Moscow Institute of Physics and Technology Dolgoprudny, Moscow Region, 141700, Institutsky Pereulok, 9, shavlovsky@yandex.ru 2 Tula State University Tula, 300600, Lenin Ave., 92, krasotkina@uic.tula.ru 3 Computing Center of the Russian Academy of Sciences Moscow, 119333, Vavilov St., 40, vmottl@yandex.ru The classical learning problem of pattern recognition in a finite-dimensional linear space of real-valued features is studied under the conditions of a non-stationary universe. The simplest statement of this problem with two classes of objects is considered under the as- sumption that the instantaneous property the universe is completely expressed by a dis- criminant hyperplane whose parameters are sufficiently slowly changing in time. In this case, any object has to be considered along with the time marker which specifies when the object was selected from the universe, and the training set becomes, actually, a time se- ries. The training criterion of non-stationary pattern recognition is formulated as a genera- lization of the classical Support Vector Machine. The respective numerical algorithm has the computational complexity proportional to the length of the training time series. Introduction  The classical stationary pattern recognition The aim of this study is creation of the main problem with two classes of objects: The mathematical framework and simplest algo- support Vector Machine rithms for solving the typical practical prob- lems of pattern recognition learning in un- Let each object of the universe  be iverses whose properties are changing in time. represented by a point in the linear space of The commonly known classical statement of features x()   x (1) (),..., x ( n ) ()   R n , and the pattern recognition problem is based on the its hidden membership in one of two classes be tacit assumption that the properties of the un- specified by the value of the class index iverse at the moment of decision making re- main the same as when the training set had y() 1, 1 . The classical approach to the been formed. The more realistic assumption on training problem developed by V. Vapnik [1] the non-stationarity of the universe, which as is based on the treating the model of the un- accepted in this work, inevitably leads to the iverse in the form of a discriminant function necessity to analyze a sequence of samples at defined by a hyperplane having a priori un- some time moments and find different recogni- known direction vector and threshold: tion rules for them. f  x()   aT x()  b is primarily  0 if y()  1, and  0 if y()  1. The unknown parameters of the hyperplane are  to be estimated via analyzing a training set of objects  j , j  1,..., N  represented by their This work is supported by grants of the Russian Foundation for Basic Research No. 05-01-00679 and 06-01-00412.
  • 2. feature vectors and class-membership indices, one. This is a standard signal (time series) so that the training set as a whole is a finite set analysis problem whose specificity boils down of pairs (x j R n , y j R ), j  1,..., N  . The to the assumed model of the relationship be- tween the hidden and the observable compo- commonly adopted training principle is that of nent. In accordance with the classification in- the optimal discriminant hyperplane to be cho- troduced by N. Wiener [2], it is natural to dis- sen from the criterion of maximizing the num- tinguish between, at last, two kinds of training ber of points which are classified correctly problems as those of estimating the hidden with a guaranteed margin conventionally taken component. to be equal to unity: N J (a, b,  j , j  1,..., N )  aT a  C   j  min, The problem of filtration of the training time j 1 (1) series. Let a new object appear at the time mo- y j (a x j  b)  1   j ,  j  0, j  1,..., N . T ment t j when the feature vectors and class- The notion of time is completely absent here. membership indices of the previous are already registered ...,(x j 1 , y j 1 ),(x j , y j ) including the The mathematical model of a non- stationary universe and the main kinds of current moment (..., t j 1 , t j ) . It is required to the training problems recurrently estimate the parameters of the dis- ˆ ˆ criminant hyperplane (a j , b j ) at each time The principal novelty of the concept of the non- stationary universe is involving the time factor moment t j immediately in the process of obser- t . It is assumed that the main property of the vation. non-stationary universe is completely expressed by the time-varying discriminant hyperplane The problem of interpolation. Let the training which, in its turn, is completely determined by time series be completely registered in some direction vector and the threshold both being time interval {(x1 , y1 ),...,(x N , yN )} before its functions of time: ft  x()   aT x()  bt pri- t processing starts. It is required to estimate the marily  0 if y()  1 and  0 if y()  1. time-varying parameters of the discriminant hyperplane in the entire observation interval Any object  is to be considered always ˆ ˆ ˆ ˆ {(a1 , b1 ),...,(aN , bN )} . along with the time mark of its appearance (, t ) . As a result, the training set gains the It is assumed that the parameters of the discri- structure of a set of triples instead of pairs: minant hyperplane a t and bt are changing (x j R n , y j R , t j ), j  1,..., N . If we order slowly in the sense that the values the objects as they appear, it is appropriate to (a j  a j 1 )T (a j  a j 1 ) (b j  b j 1 )2  a и  b speak rather about the training sequence than t j  t j 1 t j  t j 1 training set, and consider it as a time series are, as a rule, sufficiently small. This assump- with varying time steps, in the general case. tion prevents from the degeneration of the fil- tration and interpolation problem into a collec- The hidden discriminant hyperplane has dif- tion of independent ill-posed two-class train- ferent values of the direction vector and thre- ing problems each with a single observation. shold at different time moments t j . So, there exists a two-component time series with one From the formal point of view, the interpola- hidden and one observable component, respec- tion-based estimate of the discriminant hyper- tively, (a j , b j ) and ( x j , y j ) . ˆ ˆ plane parameters (a N , bN ) obtained at the last point of the observation interval is just the so- The dynamic formulation turns the training lution of the filtration problem at this time problem into that of two-component time se- moment. However, the essence of the filtration ries analysis, in which it is required to estimate problem is the requirement of evaluating the the hidden component from the observable estimates in the on-line mode immediately as
  • 3. the observations are coming one after another separable, i.e. is representable as the sum of without solving, each time, the interpolation partial functions each of which depends only problem for the time-series of increasing on one or two variables associated with one or length. two adjacent time moments. This circumstance makes it possible to build an algorithm which The training criterion in the interpolation numerically solves the problem in time propor- mode tional to the length of the training time series N. We consider here only the interpolation prob- lem. The proposed formulation of this problem The application of the Kuhn-Tacker theorem differs from a collection of classical SVM- to the dynamic problem (2) turns it into the based criteria (1) for consecutive time mo- dual form with respect to the Lagrange multip- ments only by the presence of additional terms liers  j  0 at the inequality constraints which penalize the difference between adja- cent values of the hyperplane parameters y j (aTj x j  bj )  1   j : (a j 1 , b j 1 ) and (a j , b j ) : W (1 ,...,  N )    j  2  y j yl (xTj Q jl xl  f jl )   j l  max, (3) N 1 N N 1N N J (a j , b j ,  j , j 1,..., N )   aTj a j  C   j  N j 1 j 1 j 1 l 1 j 1 N D (a  a t j 1 ) (at j  at j 1 ) D (bt j  bt j 1 ) a T b 2 N  tj , (2) y  j j  0, 0   j  C 2, j  1,..., N . j 2 t j  t j 1 j 1 y j (aTj x j  b j )  1   j ,  j  0, j  1,..., N . Matrices Q jl (n n) and F  ( f jl ) ( N  N ) do The coefficients Da  0 and Db  0 are hy- not depend here from the training time series per-parameters which preset the desired level and are determined only by the coefficients of smoothing the instantaneous parameters of D a and D b which penalize the unsmoothness the discriminant hyperplane. of the sequence of hyperplane parameters, re- spectively, direction vectors and thresholds in The criterion (2) implements the concept of (2). the optimal sufficiently smooth sequence of discriminant hyperplanes in contrast to the Theorem. The solution of the training problem concept of the only optimal hyperplane in (1). (2) is completely determined by the training The sought-for hyperplanes have to provide time series and the values of Lagrange multip- the correct classification of the feature vectors liers (1 ,...,  N ) obtained as the solution of the for as many time moments as possible with a dual problem (3): a j   yl l Q jl xl , b j  b   yl  l f jl , (4) guaranteed margin taken equal to unity just ˆ ˆ like in (1). l: l 0 l:  l  0 b'  b'' C The training algorithm b , b'   yj,  j: 0 C 2  j j 2 j:  j C 2 (5) Just as the classical training problem, the dy- b''     j yl  l ( xT Q jl x j  f jl ). l namic problem (2) is that of quadratic pro- j: 0 j C 2 l:  l  0 gramming, but it contains N (n  1)  N va- It is seen from these formulas that the solution riables in contrast to (n  1)  N variables in of the dynamic training problem depends only (1). It is known that the computational com- on those elements of the training time series plexity of the quadratic programming problem ( x j , y j ) whose Lagrange multipliers have ob- of general kind is proportional to the cube of the number of variables, i.e. the dynamic prob- tained positive values  j  0 . It is natural to lem appears, at first glance, to be essentially call the feature vectors of the respective ob- more complicated than the classical one. jects the support vectors. So, we have come to some generalization of the Support Vector However, the goal function of the dynamic Machine [1] which follows from the concept problem J (a j , b j ,  j , j 1,..., N ) is pair-wise of the optimal discriminant hyperplane (1).
  • 4. References The classical training problem is a particular case of the problem (2) when the penalties on 1. Vapnik V. Statistical Learning Theory. John-Wiley the time variation of the hyperplane parameters & Sons, Inc. 1998. 2. Wiener N. Extrapolation, Interpolation, and Smooth- infinitely grow Da  and Db  . In this ing of Stationary Random Time Series with Engi- case, we have Q jl  I , f jl  0 , and the dual neering Applications. Technology Press of MIT, John Wiley & Sons, 1949, 163 p. problem (3) turns into the classical dual prob- 3. Bazaraa M.S., Sherali H.D., Shetty C.M. Nonlinear lem [1] which corresponds to the initial prob- Programming: Theory and Algorithms. John Wiley lem (1): & Sons, 1993. 4. Platt J.C. Fast training of support vector machines W (1 ,...,  N )  N using sequential minimal optimization. Advances in 1 N N  2  j   ( y j yl xT xl ) j l  max, j Kernel Methods: Support Vector Learning, MIT Press, Cambridge, MA, 1999. j 1 j 1 l 1 N y  j j  0, 0   j  C 2, j  1,..., N, j 1 The formulas (4) and (5) will determine the training result in accordance with the classical support vector method a  a1  ...  a N and ˆ ˆ ˆ ˆ ˆ ˆ b  b  ...  b : 1 N a ˆ  y j jx j , j:  j 0 b'  b'' C b , b'   yj,  j: 0 C 2  j j 2 j:  j C 2 b''     j yl  l xT x j . l j: 0 j C 2 l:  l  0 Despite the fact that the dual problem (3) is not pair-wise separable, the pair-wise separability of the initial problem (2) makes it possible to compute the gradient of the goal function W (1 ,...,  N ) at each point   (1 ,...,  N ) and then to find the optimal admissible max- imization direction relative to the constraints via an algorithm of the linear computational complexity with respect to the length of the training time series. In particular, the standard steepest descent method of solving quadratic programming problems [3], being applied to the function W (1 ,...,  N ) , yields a generali- zation of the known SMO algorithm (Sequen- tial Minimum Optimization) [4] which is typi- cally used when solving dual problems in SVM.