Pca analysis


Published on

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Pca analysis

  1. 1. Image Pattern Matching Using Principal Component Analysis Method Lalita Kumari, Swapan Debbarma, Nikhil Debbarma, Suman Deb Department of Computer Science, NIT Agartala, IndiaAbstract— Pattern matching in image refers to searching a set of images to find out a particular image matching most withrequired image. Computer vision can‘t be implemented successfully until fast and effective pattern matching algorithm issuccessfully used. Fast pattern matching algorithm can be used with the help of Principal Component Analysis (PCA)method. In this paper we described pattern matching algorithm, based on PCA method, which is less time consuming atrun time.Keywords— Image Processing, Pattern Matching, PCA method, Image matching I. INTRODUCTION succeeding component in turn has the highest variance Pattern matching in the area of image processing is possible under the constraint that it be orthogonal tocurrent trend of research and development. Pattern (uncorrelated with) the preceding components. Principalmatching or pattern analysis is widely being used in the components are guaranteed to be independent only if thearea of Digital Image Processing. Some example of its data set is jointly normally distributed. PCA is sensitivedomain, where it is used is Optical Character to the relative scaling of the original variables.Recognition, Human body gesture recognition, facial Depending on the field of application, it is also namedexpression detection, computer vision, human-machine the discrete Karhunen–Loève transform (KLT), theinteraction, etc. Each of these areas is current research Hotelling transform or proper orthogonal decompositiontopic around the world. (POD). Pattern matching can be done by Principal PCA was invented in 1901 by Karl Pearson.[1] NowComponent Analysis, Neural Network, Fuzzy Logic it is mostly used as a tool in exploratory data analysisPrincipal, etc. In terms of space/time complexity of and for making predictive models. PCA can be done byalgorithms, PCA algorithm is fast as well as efficient. eigenvalue decomposition of a data covariance matrix or Principal component analysis (PCA) is a mainstay singular value decomposition of a data matrix, usuallyof modern data analysis - a black box that is widely used after mean centering the data for each attribute. Thebut (sometimes) poorly understood. Principal component results of a PCA are usually discussed in terms ofanalysis (PCA) is a standard tool in modern data analysis component scores (the transformed variable values- in diverse fields from neuroscience to computer corresponding to a particular case in the data) andgraphics - because it is a simple, non-parametric method loadings (the weight by which each standardized originalfor extracting relevant information from confusing data variable should be multiplied to get the componentsets. With minimal effort PCA provides a roadmap for score) (Shaw, 2003).how to reduce a complex data set to a lower dimension PCA is the simplest of the true eigenvector-basedto reveal the sometimes hidden, simplified structures that multivariate analyses. Often, its operation can be thoughtoften underlie it. of as revealing the internal structure of the data in a way Principal component analysis is a variable reduction which best explains the variance in the data. If aprocedure. It is useful when you have obtained data on a multivariate dataset is visualised as a set of coordinatesnumber of variables (possibly a large number of in a high-dimensional data space (1 axis per variable),variables), and believe that there is some redundancy in PCA can supply the user with a lower-dimensionalthose variables. In this case, redundancy means that picture, a "shadow" of this object when viewed from itssome of the variables are correlated with one another, (in some sense) most informative viewpoint. This ispossibly because they are measuring the same construct. done by using only the first few principal components soBecause of this redundancy, you believe that it should be that the dimensionality of the transformed data ispossible to reduce the observed variables into a smaller reduced.number of principal components (artificial variables) PCA is closely related to factor analysis; indeed,that will account for most of the variance in the observed some statistical packages (such as Stata) deliberatelyvariables. conflate the two techniques. True factor analysis makes Principal component analysis (PCA) is a different assumptions about the underlying structure andmathematical procedure that uses an orthogonal solves eigenvectors of a slightly different matrix.transformation to convert a set of observations of The principal component analysis (PCA) is one ofpossibly correlated variables into a set of values of the most successful techniques that have been used touncorrelated variables called principal components. The recognize faces in images. However, high computationalnumber of principal components is less than or equal to cost and dimensionality is a major problem of thisthe number of original variables. This transformation is technique.defined in such a way that the first principal componenthas as high a variance as possible (that is, accounts for as II. METHOD FOR PATTERN MATCHINGmuch of the variability in the data as possible), and each Pattern matching using Principal ComponentPublished in International Journal of Advanced Engineering & Application, June 2011 Issue 6
  2. 2. Analysis (PCA) method consists of following steps: X(i-1)*m+j = I(i,,j)1. Convert images from color to gray scale image. A(i,,j) = Xi for image number j.2. Resize all images to a fixed size image by Mi = scaling/cropping. I(i,,j) represents pixel value of one gray scale image at3. Read all images and store its pixel values into array forming a matrix having row equal to total number of row i and column j. mXn is size of gray scale image. images read. X is row matrix representing one gray scale image.4. Calculate mean of this matrix. Size of X is tX1 where t = m*n.5. Find deviation matrix from its mean matrix. Here A(i,,j) represents ith row and jth column of matrix6. Calculate Eigenface of this deviation matrix.7. Project each image into Eigenface of deviation A. And Mi represents ith row of mean matrix M. matrix. Equation-18. Read image to be match and convert into gray scale.9. Calculated difference matrix from mean matrix. Mi =10. Find minimum difference with projected test image. Here A(i,,j) represents ith row and jth column of matrix11. Image with minimum difference will represent A. And Mi represents ith row of mean matrix M. desired image with most probability of match. Equation-2 In this paper image matching algorithm discussed, isnot matching exact pixels to pixels. Instead of it, there is Next step is to calculate deviation of each imagepattern matching where principal components are from mean matrix calculated above. For this we findmatched with that of training database. The most difference of each column of main image matrix (i.e. M)matching components with training database are desired with mean matrix (i.e. m). Matrix obtained (say B) ismatched image. calculated as each column of B is equal to each column There is a training database which contains set of of A minus column matrix m.images from which we have to match pattern of another B(i,,j) = A(i,,j) –Miimages to know which image from training database has Here B(i,,j) and A(i,,j) represents ith row and jth columnbest match. of matrix B and A respectively. Mi represents ith row Before matching with training database, there is pre- of row matrix M (mean matrix).processing on images of training database, which is done Equation-3once for one set of training database. This pre-processing the quality of individual sentences. Score obtained inis not done again until training database is not changed. Calculate Eigenface of deviation matrix: is always aNow image to be matched with training database is MT evaluation using BLEU method Next step is to calculate face of deviation matrixprojected and matched by use of pre-processed data. number between 0 and 1.Its value indicates how obtained in previous step. For this fist we have to similar the candidate and reference texts are, with calculate Eigen values and eigenvector of deviationImage read, resize, and gray scale conversion: matrix. Then we 1calculate Eigen face by multiplying values closer to representing more similar texts. The All images of training database are read one by one eigenvector 1, image matrix A. there is with a human closer to to the more overlapand are converted into gray scale. Colour image gives 3- reference translation and thus the better the system is.D matrix of data (showing RGB values of each pixel) E =aA * Ev the BLEU metric measures how many In nutshell,after reading. RGB values are normalized to obtain gray Here E overlap, giving deviation matrix; And Ev is words is Eigenface of higher scores to sequentialscale image (Represented by 1-D matrix i.e. row matrix). words. Here intelligibility or grammatical correctness eigenvector of deviation matrix.After that, all images of database are reshaped into a is not taken into account. Score calculation method in Equation-4fixed size images. Suppose there is n number of images BLEU is shown in figure 1.in training set i.e. We have to take one image and match Project imagesnumber of words from the candidate that Here m is from training database into Eigenface ofthis selected image with n number of images from deviation matrix: reference. Wt is the total number of are found in thetraining set to find best match. Gray scale images from Every gray scale image is orthogonally projected words in the candidate. r is the effective length of thetraining database is read and stored into a matrix (say A) into Eigenface of deviation matrix. For this, gray scale reference corpus, and c the total length of theof column n where n is number of images from training image converted into row matrix is multiplied withdatabase. Here each column of this matrix represents one transpose of corpus. Pnmatrix calculated in previous translation Eigenface is geometric average of the modified n-gram precision. N is length of n-grams step.gray scale image from training database. used to compute Pn. Pj = E‘ * Aj is a metric for the evaluation of output, METEORMean calculation of image matrix: obtained from machine translation system. The metric Mean matrix is determined from the Matrix Here E‘ on the harmonic mean matrix, obtained by is based represents transpose of unigram precisionobtained from previous step of image matrix. Mean is transpose with recall weighted higher than precision. and recall, operation on Eigenface of deviationcalculated along rows of matrix. Here in this step wedetermine average/mean value of pixels value at each matrix. It has various features, such as stemming andco-ordinate location of image i.e. here we calculate mean Pj and Aj represent all rows of jth column of matrix P synonymy matching, along with the standardimage from images of training database. Mean matrix and A respectively. exactprojection matrix obtained is not available in P is word matching which by projecting everycalculated in this step is a row matrix (say M). grayscale images score is calculated, BLEU metric. A (orthogonally) into E‘. on the basis of which, grading isEquation-5 translation system. given to ScorePublished in International Journal of Advanced Engineering & Application, June 2011 Issue 7
  3. 3. The row containing minimum value in matrix Processing up to this step is termed as pre- Euc_dist, is matched image of our interestprocessing for pattern recognition, because patternmatching is not started yet and this processing is done Ifonce for one set of training database. Once these Euc_dist (1,k) has smallest value, the kth image fromprocessing is completed, there is no need to analyse/read training set is best matched imageimages from training database. This is valid until we donot change training database regardless of images which Equation-9we have to match with training database. III. IMPLEMENTATION DETAILSRead, resize, and gray scale conversion of image, which Figure1 shows MATLAB implementation ofhas to be matched with training database. described algorithm. Training database of 12 RGB Next step is to select target image, to be matched image having size 640 X 480 pixels taken forwith training database. Read selected image and convert experimental purpose. After conversion of images intoit into gray scale, and resize this image to same size that gray scale, resizing to 80 X 60 and storing into matrixwas used for pre-processing of training database. form, matrix ―A‖ is obtained as of dimension 4800 X 12.Convert this resized gray scale image into 1-D matrix Mean matrix m is obtained as of size4800 X 1.(row matrix). Difference matrix ―B‖ is again of same size as ―A‖. Eigen value matrix and eigenvector matrix obtained is of Z(i-1)*m+j = T(i,,j) size 12 X 12. Eigenface matrix ―E‖ calculated from matrix B is obtained as of size 4800 *12. Projection Here T is gray scaled target image. T (i,,j) represents matrix P is obtained as of size 12 *12. pixel value of T at location ith row and jth column. Now for matching an image, we have taken one Z is row matrix having number of rows equals to m * image which is resized to 80 X 60 pixels and converted n where m and n are number of rows and column of into gray scale. It is stored into matrix T of size 4800 X target image T. 1. Difference matrix and the projection matrix is Equation-6 calculated which is of size 12 X 1 Euclidian distance matrix is obtained as of size 12 X 1.Calculate difference matrix from mean matrix. Now this matrix has minimum value at row number Next step is to find difference matrix by subtracting 5. That means that selected matrix is best matched withmean matrix of training database from target image image number 5 from training dataset.matrix Z (row matrix). Now this test image is projectedon Eigenface of training data base. IV. CONCLUSION Described method in this paper, is very fast as well Q = E‘ * Z as robust technique of pattern matching. Since pre- processing is done only once for a set of training Here E‘ represents transpose matrix, obtained by database, processing time turns very low after transpose operation on Eigenface of deviation initialization. Therefore this method gives result is very matrix. fast. Even with its simplicity, described method is robust Q is projection matrix (projected test image) and always gives better result. obtained after projecting test image into Eigenface of deviation matrix. REFERENCES [1] A. Jain, R. Duin, and J. Mao, "Statistical Pattern Recognition: A E‘ is transpose matrix of E. Review", IEEE Transactions on Pattern Analysis and Machine Intelligenve, vol. 22, no. 1, pp. 4-37, 2000. Equation-7 [2] H. Bay, A. Ess, T. Tuytelaars, L. Van Gool, Speeded-up robust features (SURF). Comp. Vision and Image UnderstandingFind match between training database and target test 110(3), 346–359 (2008).image. [3] Chi-Fa Chen, Yu-Shan Tseng and Chia-Yen Chen, 2003. Combination of PCA and Wavelet Transforms for Face Final step of pattern matching using PCA method is Recognition on 2.5D Images. Conf. of Image and Visionset of sub steps as stated below. Computing 03 26-28 November 2003. Find difference between projected image and [4] Imran S. Bajwa, S. Irfan Hyder [2005], ―PCA based Imageprojected test image and calculate mean of each column Classification of Single-layered Cloud Types‖, Journal of Market Forces Vol.1 No.2, pp. 3-13by considering magnitude only. [5] M. Turk, A. Pentland, Eigenfaces for Recognition, Journal of Cognitive Neurosicence, Vol. 3, No. 1, 1991, pp. 71-86 Euc_dist (1,j) = – [6] Fukunaga, Keinosuke (1990). ―Introduction to Statistical Pattern Here Euc_dist is column matrix of size 1 row and j Recognition.‖ Elsevier. ISBN 0122698517. http://books.google.com/books?visbn=0122698517. column. [7] Jolliffe I.T., Principal Component Analysis, Series: Springer Series Here j equals size of column matrix Q. in Statistics, 2nd ed., Springer, NY, 2002, XXIX, 487 p. 28 illus. N is total number of images in training data base. ISBN 978-0-387-95442-4 Equation-8Published in International Journal of Advanced Engineering & Application, June 2011 Issue 8
  4. 4. Fig 1. MATLAB implementation of fast PCA method for image pattern matchingPublished in International Journal of Advanced Engineering & Application, June 2011 Issue 9