• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Pca 2012
 

Pca 2012

on

  • 333 views

Principle Component Analysis

Principle Component Analysis

Statistics

Views

Total Views
333
Views on SlideShare
333
Embed Views
0

Actions

Likes
0
Downloads
10
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Pca 2012 Pca 2012 Document Transcript

    • JUNE 25, 2012 1 TOPIC: BAND COMBINATIONAN D BAND RATIOS AND PRINCIPLE COMPONENT ANALYSIS (PCA) Submitted To: Sir Amir Mehmood Subject: Remote Sensing II Group: Girls_lll Members: Roll No’s Atiqa Ijaz Khan 03 Rafia Naheed 09 Syeda Rbiya Mahmood 14 Rabia Zahoor 28
    • JUNE 25, 2012 2 Table of contents Band Ratios...........................................................................................................................................03 Definitions.........................................................................................................................................03 Formula.............................................................................................................................................03 Advantage………………………………………………………………………………………………………………………………… 03 Band Combination................................................................................................................................07 Principal Component Analysis .............................................................................................................09 History...........................................................................................................................................09 Objective…………………………………………………………………………………………………………………………………09 Definitions……………………………………………………………………………………………………………………………… 09 Mathematical Analysis…………………………………………………………………………………………………………… 10 Important Terms in PCA………………………………………………………………………………………………………… 10 Outline of PCA………………………………………………………………………………………………………………………. 12 Advantages……………………………………………………………………………………………………………………………..13 Disadvantages…………………………………………………………………………………………………………………………13 Summary…………………………………………………………………………………………………………………………………13 Expert Opinion………………………………………………………………………………………………………………………..14 References............................................................................................................................................15
    • JUNE 25, 2012 3 Band ratios Definitions: “Band rationing means dividing the pixels in one band by the corresponding pixels in a second band.” “Ratio images are enhancements resulting from the division of DN values in one spectral band by the corresponding values in another band.” “To generate a band ratio image, the high digital reflectance values of a specific material is divided by the corresponding digital reflectance value with lowest reflectance.” The reason for this is twofold: 1. One is that differences between the spectral reflectance curves of surface types can be brought out. 2. The second is that illumination, and consequently radiance, may vary, the ratio between an illuminated and a not illuminated area of the same surface type will be the same. Formula for Number of Possible Ratios: The number of possible ratios that can be developed from n bands of data is: “n (n-1)” Example: For the 6 non-thermal bands of the Landsat Tm or ETM+ data there are 6(6-1), or 30 possible combinations (15 original and 15 reciprocal). Example on Vegetation: The near-infrared-to-red for healthy vegetation is normally very high. That for stressed vegetation is typically lowers (a near-infrared reflectance decrease and the red reflectance increases). The near-infrared-to-red (red-to-near-infrared) rationed image might be useful for differentiating between areas of the stressed and non-stressed vegetation. This type of ratio has been employed extensively in vegetation indices aimed at qualifying greenness and biomass. Advantage: 1. The major advantage of ratio images is that they convey the spectral or color characteristics of the image features, regardless of variations in the scene illumination conditions.
    • JUNE 25, 2012 4 2. Rationed images are often useful for discriminating refined spectral variation in a scene that are masked by the brightness variations in images from the spectral bands or in standard color composite. Which bands to ratio? The answer depends on the purpose for creating the ratio. There is a physical basis why each of the different band combinations works, but it is partially a trial and error process. These sorts of band ratios come in a couple of different “flavors”. ETM+ For an Example: Landsat images are composed of seven different bands, each representing a different portion of the electromagnetic spectrum. In order to work with Landsat band combinations (RGB composites of three bands) first we must understand the specifications of each band. Band 1: (0.45-0.52 µm, blue-green) This short wavelength of light penetrates better than the other bands, and it is often the band of choice for monitoring aquatic ecosystems (mapping sediment in water, coral reef habitats, etc.). Unfortunately this is the “noisiest” of the Landsat bands since it is most susceptible to atmospheric scatter. Band 2: (0.52-0.60 µm, green) This has similar qualities to band 1 but not as extreme. The band was selected because it matches the wavelength for the green we see when looking at vegetation.
    • JUNE 25, 2012 5 Band 3: (0.63-0.69 µm, red) Since vegetation absorbs nearly all red light (it is sometimes called the chlorophyll absorption band) this band can be useful for distinguishing between vegetation and soil and in monitoring vegetation health. Band 4: (0.76-0.90 µm, near infrared) Since water absorbs nearly all light at this wavelength water bodies appear very dark. This contrasts with bright reflectance for soil and vegetation so it is a good band for defining the water/land interface. Band 5: (1.55-1.75 µm, mid-infrared)
    • JUNE 25, 2012 6 This band is very sensitive to moisture and is therefore used to monitor vegetation and soil moisture. It is also good at differentiating between clouds and snow. Band 6: (10.40-12.50 µm, thermal infrared) This is a thermal band, which means it can be used to measure surface temperature. Band 6 is primarily used for geological applications but it is sometime used to measure plant heat stress. This is also used to differentiate clouds from bright soils as clouds tend to be very cold. The resolution of band 6 (60m) is half of the other bands. Band 7: (2.08-2.35 µm mid-infrared) This band is also used for vegetation moisture although generally band 5 is preferred for that application, as well as for soil and geology mapping.
    • JUNE 25, 2012 7 Band combination Introduction: Effective display of an image is critical for effective practice of remote sensing. Band combination is the term that remote sensing use to refer to the assignment of the colors to represent brightness in different regions of spectrum. A key constraint for the display of any multispectral image is that human vision portrays differences in the color of surfaces through our eyes ability to detect differences in brightness in three additives primary-blue, green, red. Need of Band Combinations: Single band remote sensing image may not be sufficient top extract desire information, handling of multiple bands is also inconvenient. Multiple bands may be combined to generate one or more transformed combined bands following different mathematical operations. Multiple bands combination has the capability to enhance the features of the interest of the analyst. Band combination includes addition, subtraction, and rationing, principal component analysis and so on. Mathematical treatment: Band combination is general a combination of multi band (e.g. multispectral) images. It is defined as an output of multiband functions or operations. In a normal band combination, the same operation is carried out on each pixel in the image. The output of the operation is an image of new pixel generated due to some mathematical combination of pixel values of various bands of an input image. ETM+ For an Example: Common Landsat Band Combinations Individual bands can be composited in a Red, Green, and Blue (RGB) combination in order to visualize the data in color. There are many different combinations that can be made, and each has their own advantages and disadvantages. Here are some commonly used Landsat RGB band combinations (color composites):
    • JUNE 25, 2012 8 3, 2, 1 RGB: This color composite is as close to true color that we can get with a Landsat ETM image. It is also useful for studying aquatic habitats. The downside of this set of bands is that they tend to produce a hazy image. 4, 3, 2, RGB: This has similar qualities to the image with bands 3, 2, 1 however, since this includes the near infrared channel (band 4) land water boundaries are clearer and different types of vegetation are more apparent. This was a popular band combination for Landsat MSS data since that did not have a mid-infrared band. 4, 5, 3 RGB: This is crisper than the previous two images because the two shortest wavelength bands (bands 1 and 2) are not included. Different vegetation types can be more clearly defined and the land/water interface is very clear. Variations in moisture content are evident with this set of bands. This is probably the most common band combination for Landsat imagery.
    • JUNE 25, 2012 9 7, 4, 2 RGB: This has similar properties to the 4, 5, and 3 band combination with the biggest difference being that vegetation is green. This is the band combination that was selected for the global Landsat mosaic created for NASA. 5, 4, 1 RGB: This band combination has similar properties to the 7, 4, and 2 combination, however it is better suited in visualizing agricultural vegetation. Principal Component Analysis (PCA) History: Principal component analysis was first proposed in 1933 by Hotelling in order to solve the problem of decor relating the statistical dependency between variables in multi-variety statistical data derived from exam scores, Hotelling (1933). Since then, PCA has become a widely used tool in statistical analysis for the measurement of correlated data relationships between variables, but it has also found applications in signal processing and pattern recognition for which it is often referred to as the Karhunen-Loeve transform, Therein (1989).
    • JUNE 25, 2012 10 Objectives of Principal Component Analysis: 1. To discover or to reduce the dimensionality of the data set. 2. To identify new meaningful underlying variables. Definition of Principal Components Analysis (PCA): “Is a method in which original data is transformed into a new coordinate system, which acts to condense the information, which is found in the original inter-correlated variables into a few uncorrelated variables, called principal components” “In any principal components rotation, the first component or dimension accounts for the maximum proportion of the variance of the original image, and subsequent components account for maximum proportion of the remaining variance.” Mathematical background on principal component analysis: Eigen Analysis: The mathematical technique used in PCA is called Eigen analysis: Solve for the eigenvalues and eigenvectors of a square symmetric matrix with sums of squares and cross products. The eigenvector associated with the largest eigenvalue has the same direction as the first principal component. The eigenvector associated with the second largest eigenvalue determines the direction of the second principal component. The sum of the eigenvalues equals the trace of the square matrix and the maximum number of eigenvectors equals the number of rows (or columns) of this matrix. Important Terms in Principle Component Analysis (PCA): 1. Factor analysis: The search for the “factors” (i.e. band combinations) that contain the most information. 2. Original Data: The set of brightness values for n bands and m pixels.
    • JUNE 25, 2012 11 3. PCA: A linear method of factor analysis that uses the mathematical concepts of eigenvalues and eigenvectors. It amounts to a “rotation” of the coordinate axes to identify the Principle Components. 4. Principle Component: An optimum linear combination of band values comprising a new data layer (or image). 5. Co-variance: A measure of the redundancy of two bands (i and j), created by summing the product of the two band values over all the pixels (M). 6. Correlation: Co-variance normalized by the variances of the two bands. 7. Redundant bands: Bands with a CC=1 contain the same information. 8. Correlation Matrix: A square symmetric matrix containing the correlation coefficients between every pair of bands. It contains statistical information about the data. 9. Eigenvector: The set of weights applied to band values to obtain the PC. Eigenvectors represent the orientation of the principal axes of each component (as an angle). Eigenvectors are standardized so that the squares of the elements sum to one. Therefore, an eigenvector loading reflects the relative importance of a variable within a principal component but does
    • JUNE 25, 2012 12 not reflect the value of the component itself. Eigen Vectors show the direction of axes of a fitted ellipsoid 10. Eigenvalue: A measure of the variance in a PC. Eigenvalues represent the lengths of each successive component axis, therefore the greater the eigenvalues the more "important" the component is for explaining the variation in the dataset. The percentage of the total variance explained by each eigenvalue is often more useful as it gives the relative contribution of each eigenvalue to explaining the variance in the dataset. Eigen Values show the significance of the corresponding axis. The larger the Eigen value, the more separation between mapped data. For high dimensional data, only few of Eigen values are significant. 11. Axes Rotation: Multiplying the original data matrix by a matrix of eigenvectors is equivalent to rotating the data to a new coordinate system. 12. Standraised PC: The principal components calculated using correlation matrix. 13. Unstandardized PC: The principal components calculated using covariance matrix. Outline of Principle Component Analysis: 1. Start with an image data set including the reflectance values from n bands with m pixels. This non-square nxm matrix will be called D. 2. This data set may contain “redundancies” i.e. bands whose reflectance’s correlate perfectly with another band. It may also contain noise. Noise: Our definition of noise is signal that does not correlate at all between bands 3. Subtract the means from each band and compute the variances and co-variances between each pair of bands. Place these values into an nxn square matrix (say A). It is symmetric. Normalize the co-variances by the square-root of the variances to form the correlation matrix. This is a useful matrix to study, and it forms the basis of PCA. Note: At this point, one could just delete bands that correlate well with other bands. This action would reduce the size of the data set. The PCA method below is more objective and systematic.
    • JUNE 25, 2012 13 4. Find the eigenvalues and eigenvectors of the dataset by solving this formula: ________________________ (1) Where λ is an eigenvalue and V is an eigenvector. Eigen: The word “Eigen” means that these quantities are characteristics of the correlation matrix (say A).They reveal the hidden properties it. Typically there will be n different solutions to (1), so there will be n paired eigenvalues and eigenvectors (i.e. iλ .and IV). The eigenvalues will be real and positive (because A is symmetric). We usually list these eigenvalues and eigenvectors in order of decreasing eigenvalue. That is, the first eigenvector corresponds to the largest eigenvalue. 5. The eigenvectors have a dimension equal to n, i.e. the number of original bands. The first eigenvector represents a synthetic spectrum containing the largest variance across the scene. 6. The eigenvectors are orthogonal to each other, for example 021 =VV. They are normally scaled so that their length is unity, that is 1|| =iV. With these two properties, the multiplying the original dataset by an eigenvector rotates the reflectance vector for a pixel where PC1 is the first Principle Component. It is a vector with m components representing a brightness value for each pixel, i.e. it is a new single band image. Its pixel values are linear combinations of the original band values for that pixel. The weights are given by the components of the first eigenvector. Because of our ordering of eigenvalues, this image contains the most “information” of any single image. The first eigenvalue is proportional to the brightness variance in the first PC. To obtain the other Principle Components, we repeat (2) with the other eigenvectors so that ___________________ (2) And the PC data layers can be stacked to form a new “data cube”. If desired, only the first few PCs can be kept, reducing the size of the dataset. For example, if the original dataset had 200 bands (i.e. n = 200), you could keep only the ten PCs with the largest eigenvalues. This dataset is only 1/20 of the original size. The number of pixels is unchanged.
    • JUNE 25, 2012 14 7. A remarkable property of the new data cube is that the band values are completely uncorrelated. There is no more redundancy! (Actually, the uncorrelated data in the original scene is pushed off into the high PC components.) Another property is that the bands are ordered by their “information content” (i.e. variance). Advantage: The primary advantage of PCA is Principal Component’s analysis generates orthogonal (uncorrelated) components that represent 100% of the variance present in the original dataset. Disadvantage: A disadvantage of the PC representation is that one can no longer identify spectral signatures of objects. A “pixel profile” in the new data cube is not a spectral signature (i.e. reflectance plotted against wavelength). Summary of Principal Components Analysis (PCA):  PCA is a technique that transforms the original vector image data into smaller set of uncorrelated variables.  The variables represent most of the image information and easier to interpret.  Principal components are derived such that the first PC accounts for much of the variation of the original data. The second (vertical) accounts for most of the remaining variation.  PCA is useful in reducing the dimensionality (number of bands) that used for analysis. Minimum noise fraction (MNF) method can be used with hyper spectral data for noise reduction. Expert Opinion about PCA: 1. Mather, (2003) states that PCA is a standard method for deriving reduced data or minimizing information redundancy in the original image. 2. Zumsprekel and Prinz, (2000) states that PCA reduces the dimensionality Of the dataset while retaining as much information as possible. 3. A more detailed reading of PCA can be taken from Jensen (1996), Mather (2003), and Gibson and Power (2000). Principal Component Analysis was run on bands 1, 2, 3, 4, 5, &7. The results were six PCA bands and were visually assessed using RGB band combinations. 4. Zumsprekel and Prinz, (2000) and Rajesh (2008) state that the first PC (PC -1) combines the total albedo difference of all original TM bands, the second PC (PC-2) emphasize the spectral differences between the visible spectrum (VIS) and the infrared spectrum (IR) and the third PC (PC-3) illustrates albedo variations within the IR spectrum. They further state that the higher principal components (PC-4 to -6) may contain important lithological information but are often increasingly loaded with noise effect. The first 3 PC bands have been selected in this study, because they best highlight the lithological features.
    • JUNE 25, 2012 15 References PDF-FORMATS: 1. A HYBRID IMAGE CLASSIFICATION APPROACH FOR THE SYSTEMATIC ANALYSIS OF LAND COVER (LC) CHANGES IN THE NIGER DELTA REGION. 2. A TUTORIAL ON PRINCIPAL COMPONENT ANALYSIS. 3. EVALUATING PRINCIPAL COMPONENTS ANALYSIS FOR IDENTIFYING OPTIMAL BANDS USING WETLAND HYPERSPECTRAL MEASUREMENTS FROM THE GREAT LAKES, USA , 4. IDENTIFYING HYDROCARBON LEAKAGE INDUCED ANOMALIES USING LANDSAT-7 /ETM+ DATA PROCESSING TECHNIQUES IN THE WEST SLOPE OF SONGLIAO BASIN, CHINA 5. LAND-COVER CLASSIFICATION USING ASTER MULTI-BAND COMBINATIONS BASED ON WAVELET FUSION AND SOM NEURAL NETWORK BOOKS: 1. DIGITAL REMOTE SENSING BY PRITHVISH NAG, M. KUDRAT 2. INTRODUCTION TO REMOTE SENSING, BY JAMES CAMPBELL, RANDOLPH WYNNE 3. JENSEN, J.R. (1996). INTRODUCTORY DIGITAL IMAGE PROCESSING: A REMOTE SENSING PERSPECTIVE. SECOND EDITION. PRENTICE HALL. 4. LILLESAND, T. M. AND R. W. KIEFER (2002). REMOTE SENSING AND IMAGE INTERPRETATION. NEW YORK, JOHN WILEY & SONS, FIFTH EDITION. 5. TEXTBOOK OF REMOTE SENSING AND GEOGRAPHICAL INFORMATION SYSTEM BY KALI CHARAN SAHU