Dimensionality reduction


Published on

Linear and Non-Linear Dimensionality Reduction Techniques

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Dimensionality reduction

  1. 1. Tutorial on Dimensionality Reduction Shatakirti MT2011096
  2. 2. Dimensionality ReductionContents1 Introduction 22 Linear Dimensionality Reduction 23 Non-Linear Dimensionality Reduction 5 3.1 Manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.2 Manifold Learning . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2.1 Isomap . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.2.2 Locally Linear Embedding (LLE) . . . . . . . . . . . . 8 3.2.3 Isomap vs Locally Linear Embedding (LLE) . . . . . . 10 3.3 Applications of Manifold Learning . . . . . . . . . . . . . . . . 11References 12List of Figures 1 Finding the Principal Components . . . . . . . . . . . . . . . 3 2 Principal Components Analysis on 3D data . . . . . . . . . . . 4 3 Manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 4 Swiss Roll Manifold . . . . . . . . . . . . . . . . . . . . . . . . 6 5 Isomap Manifold Learning . . . . . . . . . . . . . . . . . . . . 8 6 LLE algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1
  3. 3. Dimensionality Reduction1 Introductionx In the recent times, data has become very large and therefore requiresclassification of data from an extremely large data set. Many applicationsin data mining require deriving a classifier or a function estimate from avery large data set. The present state of data sets provide a large number ofexamples in which the data is classified and given as an example for classifyingthe data which may come in the future. The classified dataset consists of alarge number of features, some of which may be irrelavent and sometimes evenmisleading. This may be an issue for an algorithm attempting to generalizethe data. Hence, the datasets which have an extremely complex featuresets will slow down any algorithm which attempts to classify it and make itdifficult to find an optimal result. In order to decrease the burden on theclassifiers and function estimators, we will have to reduce the dimensionalityof the data so that the number of features in the data will reduce by alarge extent. Thus, dimensionality reduction simplifies data so that it canbe efficiently processed. Apart from visualization, dimensionality reduction helps reveal what isthe main feature governing a data set. For example, suppose we want to clas-sify a mail as a spam or a non-spam. A general approach we would followis to represent it a vector of words appearing in the email. The dimension-ality of this could easily be in hundreds. But, by dimensionality reductionapproach may reveal that there are only a few telling features like the words”free”,”donate”,etc. This can help classify the mail as spam. There are two types by which we can reduce the dimensionality of thegiven data set. They are: 1. Linear Dimensionality Reduction 2. Non-linear Dimensionality Reduction2 Linear Dimensionality ReductionThe most popular algorithm for dimensionality reduction is Principal Com-ponent Analysis(PCA). Given a data set, PCA finds the vectors along whichthe data has maximum variance in addition to the relative importance ofthese directions. An example will explain the PCA in a more intutive way.Take for example, the data we have is the surface of a teapot and we needto capture the most information about the 3D teapot. 2
  4. 4. Dimensionality Reduction In order to achieve this, we rotate the teapot and get a position wherewe can get the most visual information. The method to achieve that will be:First find the axis so that the object has largest extend in average along theaxis (red axis). Next, rotate the object around the first axis to find the axisthat is perpendicualr to the first axis, and the object has largest extend inaverage along this axis (green axis). Figure 1: Finding the Principal Components The two axises found are the first and the second principal component.And the extends in average along the axises are called the eigenvalues.Mathematically, the steps involved in PCA are:Suppose we have n documents and overall of m terms, 1. Construct an m x n term document matrix A. Each document is rep- resented as a column vector of m dimensions. 2. Compute the empirical mean of each term. 3. Compute the normalized matrix by subtracting emperical mean from each data dimensions. The mean subtracted is the average across each dimension. 4. Calculate the m x m term covariance matrix from the normalized pro- jections. 5. Calculate the eigenvectors and the eigenvalues of the covariance matrix. Since the covariance matrix is square, we can calculate the eigenvectors 3
  5. 5. Dimensionality Reduction and the eigenvalues for this matrix. It is important to notice that these eigenvectorss are both unit eigenvectors. This is very important for PCA. 6. Once the eigenvectors are found from the covariance matrix, the next step is to order them by eigenvalue, highest to lowest. This gives the components in order of significance. Now, if we wish, we can ignore the components of lower significance. we may lose some information, but if the eigenvalues are small, we may not lose much. Then, the final data set will have lesser dimensions. To be precise, if you originally have n dimensions in your data, and so you calculate n eigenvectors and n eigenvalues, and then you choose only the first p eigenvectors, then the final data set has only p dimensions. The value of p can be decided by computing the cumulative energy of the eigenvalues. Choose p such that the cumulative energy is above a certain threshhold say 90% of the overall cumulative energy. Figure 2: Principal Components Analysis on 3D data Despite PCA’s popularity, it has a number of drawbacks. One of themajor drawback is the requirement that the data lie on a linear subspace.For example, in the figure 4 below (known as a swiss roll), the data is actually2 dimensional manifold, but PCA will not correctly extract this data set. There are other approaches to reduce the number of dimensions. It hasbeen observed that high-dimensional data is often much simpler than thedimensionality it actually shows. In other words, a high-dimesional data set 4
  6. 6. Dimensionality Reductionmay contain many features that are all measurements of the same underlyingcause and thus are very closely related. For ex. taking a video footage ofa singe object from multiple angles simultaneously. The features of such adataset contain a lot of overlapping information. This notion is formalizedusing the notion of a manifold.3 Non-Linear Dimensionality Reduction3.1 ManifoldsA manifold is basically a low-dimensional Eucledian subspace to which ahigher-dimensional subspace is mapped to. A more general topological man-ifold can be described as a topological space that on a small enough scaleresembles the Euclidean space of a specific dimension, called the dimension ofthe manifold. And thereby, a line and a circle are one dimensional manifolds.A plane and a sphere are two dimensional manifolds and so on. (a) The sphere (surface of a ball) (b) A 1D manifold embedded in 3D is a two-dimensional manifold since it can be represented by a col- lection of two-dimensional maps. Source:Wikipedia Figure 3: Manifolds In the figure a above, we notice that the triangle drawn on the 3D globecan actually be represented linearly on a 2D space. In the figure b above, notice that the curve is in 3D, but it has zero volumeand zero area and hence the 3D representation is somewhat misleading sinceit can be represented as a line (1D). 5
  7. 7. Dimensionality Reduction3.2 Manifold LearningManifold learning is one of the most popular approach to non-linear dimen-sionality reduction. The algorithms used for this job are based on the ideathat the data is acually present in the low-dimension but is embedded in ahigh-dimension space, where the low-dimensional space reflects the under-lying parameters. Manifold learning algorithms try to get these parametersin order to find a low-dimensional representation of the data. Some of thewidely used algorithms for this purpose are: Isomap, Locally Linear Embed-ding, Laplacian Eigenmaps, Semidefinite Embedding. The best example usedto explain the manifold learning is the swiss roll, a 2D manifold embeddedin 3D shown in the figure below. Figure 4: Swiss Roll Manifold3.2.1 IsomapIsomap - short for isometric feature mapping - was one of the first algorithmsintroduced for manifold learning. Its one of the most applied procedure forthe problem. Isomap consists of two main steps: 1. Estimate the geodesic distances (distances along a manifold) between the points using shortest path distances on the data set’s k-nearest neighbour graph. 2. Then use Multi Dimensional Scaling (MDS), to map the distances we got in the first step, onto the LD Eucledian space keeping in mind the interpoint distances between the points computed in the first step. 6
  8. 8. Dimensionality ReductionEstimating Geodesic distances A geodesic is defined as a curve that locally minimizes the distance be-tween two points on any mathematically defined space, such as a curvedmanifold. Equivalently, it is a path of minimal curvature. In noncurvedthree-dimensional space, the geodesic is a straight line. We make an assumption that the data is present in D dimension and themanifold is assumed to be present in d dimention. Isomap furthur assumesthat there is a chart that preserves the distances between points i.e. if xi,xjare points on the manifold and G(xi , xj ) is the geodesic distance betweenthem, then there is a chart f : M → Rd such that||f (xi ) − f (xj )|| = G(xi , xj )|| and the manifold is smooth enough that thegeodesic distance is between the nearby points are approximately linear.Multidimensionaly scaling (MDS) After finding the geodesic distances, Isomap finds the points whose Eu-clidean distances are equal to these geodesic distances. And since the mani-fold is isometrically embedded, such points exist. Multidimensional Scaling isa classical techique that may be used to find such points. The MDS basicallyconstructs a set of points from a matrix of dissimlarities whose interpointEuclidean distances match those in data’s actual dimension D, closely. Clas-sical MDS (cMDS) algorithm is used by Isomap to minimize the cost. ThecMDS algorithm takes an input matrix giving dissimilarities between pairsof items and outputs a coordinate matrix whose configuration minimizes aloss function called strain.Hence, first compute the pairwise distances from a given set of m vectors,(x1 , x2 , ..xm ) in n dimensional space. · · · δ1.m   0 δ1,2  δ2,1  0 · · · δ2,m   ∆=  .  . ...   .   δm,1 δm,2 · · · 0And then, map the vectors onto the manifold of a lower dimension k<<n,subjectto the following optimization criteria: min(x1 , x2 , ..xm ) (|xi − xj | − δi,j )2 i<j 7
  9. 9. Dimensionality ReductionIsomap workingThe Isomap algorithm sestimates the geodesic distances using the shortestpath algorithms and then finding an embedding of these distances in Eu-clidean space using cMDS algorithm. Algorithm 1: Isomap input : x1 , x2 ...., xn RD , k 1. Form the k-nearest neighbor graph with edge weights Wij := |xi − xj | for neighboring points xi , xj . 2. Compute the shortest path distances between all pairss of points using Dijkstra’s or Floyd’s algorithm. Store the squares of these distances in D where D is the Euclidean distance matrix. 3. Return Y:=cMDS(D). Figure 5: Isomap Manifold Learning One particularly helpful feature of Isomap - not found in some of theother algorithms - is that it automatically provides an estimate of the dimen-sionality of the underlying manifold. In particular, the number of non-zeroeigenvalues found by Classical MDS (cMDS) gives the underlying dimension-ality.3.2.2 Locally Linear Embedding (LLE)LLE assumes the manifold to be a collection of overlapping coordinate patches.And if the neighbourhood sizes are small and the manifold is smooth, then,the patches can be considered as almost linear. LLE also begins by finding aset of nearest neighbors of each point. It then computes a set of weights for 8
  10. 10. Dimensionality Reductioneach point from its neighbors that best reconstructs each data point. Thenin the end, it uses the eigenvector based optimization to find the LD embed-ding of points such that the intrinsic weights are preserved to maintain thenonlinear manifold in the LD space. Figure 6: LLE algorithm To be more precise, the LLE algorithm is given as inputs an n x p datamatrix X, with rows xi , a desired number of dimensions q < p and an integerk for finding local neighborhoods, where k ≥ q+1. The output is supposedto be an n x q matrix Y, with rows yiThe steps involved in the LLE algorithm is given below: 9
  11. 11. Dimensionality Reduction Algorithm 2: Locally Linear Embedding (LLE) 1. For each xi , find the k nearest neighbors 2. Find the weight matrix W which minimizes the residual sum of squares for reconstructing each xi from its neighbors. n RSS(w) ≡ ||xi − wij xj ||2 i=1 j=1 where wij = 0 unless xj is one of xi ’s k-nearest neighbors, and for each i, wij = 1 j 3. Find the cordinates Y which minimize the reconstruction error using the weights, n φ(Y) ≡ ||yi − wij yj ||2 i=1 j=1 subject to the constraints that Yij = 0 for each j and that Y T Y = I j3.2.3 Isomap vs Locally Linear Embedding (LLE)Embedding TypeIsomap looks for isometric embedding i.e it assumes that there is a coordi-nate chart from the parameter space to HD space that preserves interpointdistances and attempt to uncover this chart. LLE looks for conformal map-pings i.e mapping which preserves local distances between points but not thedistances between all the points.Local vs GlobalIsomap is a global method because it considers the geodesic distances be-tween all pairs of the points on the manifold. LLE is a local method becauseit constructs an embedding considering only the placement of a point withrespect ot its neighbors. 10
  12. 12. Dimensionality Reduction3.3 Applications of Manifold LearningManifold learning methods are adaptable data-representation techniques thatenable dimensionality reduction and processing tasks in meaningful spaces.Their success in medical image analysis as well as in other scientific fields liesin both, their flexibility and the simplicity of their application. In medicalimaging, manifold learning has been successfully used to visualize, cluster,classify and fuse high dimensional data, as well as for Content Based ImageRetrieval (CBIR), segmentation, registration, statistical population analysisand shape modeling and classification. 1. Manifold learning is used for patient position detection in MRI. Low- resolution images, acquired during the initial placement of the patient in the scanner, are exploited for detecting the patient position. 2. Isomap Method is used in prediction of Protein Quaternary Structure. 3. medical image analysis: applications to video endoscopy and 4D imag- ing. 4. Identifying spectral clustering. 5. Identify increase or decrease in disease cells or a tumour in application to Neuroimaging. 6. Character Recognition. 7. Research is going on for using Manifold learning for image and video indexing. As we know, there are millions and millions of videos on the internet. These are stored in the repositories along with the people creating and sharing them. Now, if a person queries for an image or a video, we need to effectively identify duplication and copyright of the image or the videos. For this purpose, manifold learning is being used for image/video analyzing, indexing and searching. 11
  13. 13. Dimensionality ReductionReferences[1] ”Algorithms for manifold learning” by Lawrence Cayton - June 15,2005.[2] ”A tutorial on Principal Component Analysis” by Lindsay I Smith - February 26,2002[3] ”Linear Dimensionality Reduction” by Percy Liang - October 16,2006[4] ”A layman’s introduction to principal component analysis” by VisuMap Technologies[5] en.wikipedia.org/wiki/Principal_component_analysis[6] en.wikipedia.org/wiki/Nonlinear_dimensionality_ reductionr 12