Digital Image Classification
Multispectral classification is the process of sorting pixels into
a finite number of individual classes, or categories of data,
based on their data file values.
If a pixel satisfies a certain set of criteria, the pixel is assigned
to the class that corresponds to that criteria.
Multispectral classification may be performed using a variety
of algorithms.
Hard classification using supervised or unsupervised
approaches.
Classification using fuzzy logic, and /or
Hybrid approaches often involving use of ancillary
information.
Multispectral Classification
• Assigning each pixel in a remotely sensed image a label describing
real world object.
What is Digital Image Classification?
Grouping of similar pixels
Separation of dissimilar ones
Assigning class label to pixels
Resulting in manageable size of
classes
Classification Methods
Manual
Computer Assisted
Stratified
To translate continuous variability of image data into map
patterns that provide meaning to the user.
To obtain insight in the data with respect to ground cover and
surface characteristics.
To find anomalous patterns in the image dataset.
Cost efficient in the analyses of large datasets.
Results can be reproduced.
More objective than visual interpretation
Effective analysis of complex multi-band (spectral)
interrelationships.
Reasons for using Digital Image Classification
Dimensionality of Data
Spectral Dimensionality is determined by the number of sets of
values being used in a process.
In image processing, each band of data is a set of values. An
image with four bands of data is said to be four-dimensional
(Jensen, 1996).
Measurement Vector
The measurement vector of a pixel is the set of data file values
for one pixel in all n bands.
Mean Vector
When the measurement vectors of several pixels are analyzed,
a mean vector is often calculated.
This is the vector of the means of the data file values in each
band. It has n elements.
Image Space
Image Space (col, row)
Array of elements corresponding to reflected or
emitted energy from IFOV.
Spatial arrangement of the measurements of the
reflected or emitted energy.
Feature Space
A feature space image is simply a graph of the data file values
of one band of data against the values of another band.
Spectral Distance
Euclidean Spectral distance is distance in n- dimensional
spectral space.
It is a number that allows two measurement vectors to be
compared for similarity.
The spectral distance between two pixels can be calculated as
follows:
Image Classification Process
Validation of the
Result
Definitions of the Clusters
in the feature Space
Selection of
the Image
Data
Classification Types
Common classification procedures can be broken down into two
broad subdivisions based on the method used
Supervised Classification
Unsupervised Classification
Supervised Classification
The identity and location of some of the land cover types such
as urban, agriculture , wetland are known as priori through a
combination of field work and experience.
The analyst attempts to locate specific sites in the remotely
sensed data that represent homogenous examples of these land
cover types known as training sites.
Multivariate statistical parameters are calculated for these
training sites.
Every pixel both inside and outside the training sites is
evaluated and assigned to the class of which it has the highest
likelihood of being a member.
Unsupervised Classification
The identities of land cover types to be specified as classes
within a scene are generally not known as priori because
ground reference information is lacking or surface features
within the scene are not well defined.
The computer is required to group pixels with similar spectral
characteristics into unique clusters according to some
statistically determined criteria.
Analyst then combine and re-labels the spectral clusters into
information classes.
Supervised vs. Unsupervised Training
In supervised training, it is important to have a set of desired classes
in mind, and then create the appropriate signatures from the data.
Supervised classification is usually appropriate when you want to
identify relatively few classes, when you have selected training sites
that can be verified with ground truth data, or when you can identify
distinct, homogenous regions that represent each class.
On the other hand, if you want the classes to be determined by
spectral distinctions that are inherent in the data so that you can
define the classes later, the application is better suited to
unsupervised training.
Unsupervised training enables you to define many classes easily, and
identify classes that are not in contiguous, easily recognized regions.
Supervised Classification
In supervised training, you rely on your own pattern
recognition skills and priori knowledge of the data to help the
system determine the statistical criteria (signatures) for data
classification.
To select reliable samples, you should know some information-
either spatial or spectral-about the pixels that you want to
classify.
Training Samples and Feature Space Objects
Training samples (also called samples) are sets of pixels that represent what
is recognized as a discernible pattern, or potential class. The system
calculates statistics from the sample pixels to create a parametric signature
for the class.
Selecting Training Samples
- Training data for a class should be collected from homogenous
environment.
- If training data is being collected from n bands then, >10n pixels of training
data is to be collected for each class.
19.
There are a number of ways to collect training site data-
using a vector layer
defining a polygon in the image
using a class from a thematic raster layer from an
image file of the same area (i.e, the result of an
unsupervised classification)
20.
Evaluation of Signatures
Ellipse – view ,Ellipse diagrams and scatterplots of data file values
for every pair of bands.
There are tests to perform that can help determine whether the signature
data are a true representation of the pixels to be classified for each
class. You can evaluate signatures that were created either from
supervised or unsupervised training.
Evaluation of Signatures
Signature separability is a satistical measure of distance between
two signatures.
Separability can be calculated for any combination of bands that
is used in the classification, enabling you to rule out any bands
that are not useful in the results of the classification.
Selecting Appropriate Classification Algorithm
Various supervised classification algorithms may be used to
assign an unknown pixel to one of the classes.
The choice of particular classifier depends on nature of input data
and output required.
Parametric classification algorithms assume that the observed
measurement vectors Xc, obtained for each class in each spectral
band during the training phase are Gaussian in nature.
Non Parametric classification algorithms make no such
assumptions.
There are many classification algorithms i.e. Parallelepiped,
Minimum distance, Maximum Likelihood etc.
Parallelepiped Classification Algorithm
In the parallelepiped decision rule, the data file values of the
candidate pixel are compared to upper and lower limits. These
limits can be either:
1. The minimum and maximum data file values of each band in the
signature,
2. the mean of each band, plus and minus a number of standard
deviations, or
3. Any limits that you specify , based on your knowledge of the
data and signatures.
There are high and low limits for every signature in every band.
When a pixel’s data file values are between the limits for every
band in a signature, then the pixel is assigned to that signature’s
class.
Parallelepiped Classification Algorithm
Therefore, if the low and high decision boundaries are defined as-
Lck = ck – Sck and
Hck = ck +Sck
The parallelepiped algorithm becomes
Lck BVijk Hck
Overlap Region
In cases where a pixel may fall into the overlap
region of two or more parallelepipeds, you must
define how the pixel can be classified.
The pixel can be classified by the order of the
signatures.
The pixel can be classified by the defined parametric
decision rule.
The pixel can be left unclassified.
Advantages:
Fast and simple.
Gives a broad classification thus narrows down the number of
possible classes to which each pixel can be assigned before more
time consuming calculations are made.
Not dependent on normal distributions
Disadvantages:
Since parallelepiped has corners, pixels that are actually quite far,
spectrally from the mean of the signature may be classified.
Minimum Distance to Means Classification
Algorithm
This decision rule is computationally simple and commonly used.
Requires mean vectors for each class in each band ck from the
training data.
Euclidean distance is calculated for all the pixels with all the
signature means
Where ck and cl represent the mean vectors for class c measured
in bands k and l
Any unknown pixel will definitely be assigned to one of any
classes, there will be no unclassified pixel.
Advantages:
Since every pixel is spectrally closer to either one
sample mean or other so there are no unclassified
pixels.
Fastest after parallelepiped decision rule.
Disadvantages
Pixels which should be unclassified will become
classified.
Does not consider class variability.
Mahalanobis Decision Rule
Mahalanobis distance is similar to minimum distance, except that
the covariance matrix is used in the equation.
Variance and covariance are figured in so that clusters that are
highly carried lead to similar varied classes,
Advantages:
Takes the variability of classes into account unlike minimum
distance or parallelepiped.
May be more useful than minimum distance in cases where
statistical criteria ( as expressed in the covariance matrix ) must
be taken into account.
Disadvantages:
Tends to overclassify signatures with relatively large values in the
covariance matrix.
Slower to compute than parallelepiped or minimum distance.
Mahalanobis distance is parametric, meaning that it relies heavily
on a normal distribution of the data in each input band.
Maximum Likelihood/Bayesian Decision Rule
The maximum likelihood decision rule is based on the probability
that a pixel belongs to a particular class. The basic equation
assumes that these probabilities are equal for all classes, and that
the input bands have normal distributions.
If you have a priori knowledge that the probabilities are not equal
for all classes, you can specify weight factors for particular
classes. This variation of the maximum likelihood decision rule is
known as the Bayesian decision rule ( Hord, 1982).
32.
The equation for the maximum likelihood/Bayesian classifier is
as follows:
Advantages
The most accurate of the classifiers (if the input samples/clusters
have a normal distribution, because it takes the most variable into
consideration.
Takes the variability of classes into account by using the
covariance matrix, as does Mahalanobis distance.
Disadvantages
An extensive equation that takes a long time to compute. The
computation time increases with the number of input bands.
Maximum likelihood is parametric, meaning that it relies heavily
on a normal distribution of the data in each input band.
Tends to overclassify signatures with relatively large values in the
covariance matrix.
Unsupervised Classification
It requires only a minimum amount of initial input from the analyst.
Numerical operations are performed that search for natural groupings
of the spectral properties of pixels.
User allows computer to select the class means and covariance matrices
to be used in the classification.
Once the data are classified, the analyst attempts a posteriori to assign
these natural or spectral classes to the information classes of interest.
Some clusters may be meaningless because they represent mixed
classes.
Clustering algorithm used for the unsupervised classification generally
very according to the efficiency with which the clustering takes place.
Two commonly used methods are-
1. Chain method
2. Isodata clustering
Chain Method
Operates in two pass mode ( it passes through the registered
multispectral dataset two times).
In the first pass, the program reads through the dataset and
sequentially builds clusters.
A mean vector is associated with each cluster.
In the second pass, a minimum distance to means classification
algorithm is applied to whole dataset on a pixel by pixel basis
whereby each pixel is assigned to one of the mean vectors created
in pass 1.
The first pass automatically creates the cluster signatures to be
used by supervised classifier.
36.
Pass 1: Cluster Building
During the first pass, the analyst is required to supply four types
of information-
R, the radius distance in spectral space used to determine when a
new cluster should be formed.
C, a spectral space distance parameter used when merging
clusters when N is reached.
N, the number of pixels to be evaluated between each major
merging of clusters.
Cmax , maximum no. of clusters to be identified.
Pass 2: Assignment of pixels to one of the Cmax clusters using
minimum distance classification logic.
37.
Pass 2: Assignment of Pixels to one of the Cmax Clusters
using Minimum Distance Classification Logic
The final cluster mean data vectors are used in a
minimum distance to means classification algorithm
to classify all the pixels in the image into one of the
Cmax clusters.
ISODATA Clustering
The Iterative self-Organizing Data Analysis Technique (ISODATA)
represents a comprehensive set of heuristic (rule of thumb) procedures
that have been incorporated into an iterative classification algorithm.
The ISODATA algorithm is an modification of the k-means clustering
algorithm, which includes (a) merging clusters if their separation
distance in multispectral feature space is below a user-specified
threshold and (b) rules of splitting a single cluster into two clusters.
ISODATA is iterative because it makes a large number of passes
through the remote sensing dataset until specified results are obtained,
instead of just two passes.
ISODATA does not allocate its initial mean vectors based on the
analysis of pixels rather, an initial arbitrary assignment of all Cmax
clusters takes place along an n-dimensional vector that runs between
very specific points in feature space.
ISODATA algorithm normally requires the analyst to specify-
Cmax : maximum no. of clusters to be identified.
T: maximum % of pixels whose class values are allowed to be
unchanged between iterations.
M: maximum no. of times isodata is to classify pixels and
recalculate cluster mean vectors.
Minimum members in a cluster.
Maximum standard deviation for a cluster.
Split separation value (if the values is changed from 0.0, it takes
the place of S.D).
Minimum distance between cluster means.
Phase 1: ISODSTA Cluster Building using many passes through the dataset
41.
(a) Distribution of 10 ISODATA mean
vectors after just one iteration.
(b) Distribution of 20 ISODATA mean
vectors after 20 iterations. The bulk of
the important feature space (the gray
background) is partitioned rather well
after just 20 iterations.
42.
Accuracy Assessment
Accuracy assessment is a general term for comparing the classification to
geographical data that are assumed to be true, in order to determine the
accuracy of the classification process. Usually, the assumed-true data are
derived from ground truth data.
Error Matrix
Once a classification has been sampled a contingency table (also referred to
as an error matrix or confusion matrix) is developed.
- This table is used to properly analyze the validity of each class as well as
the classification as a whole.
In this way, in more detail the efficiency of the classification can be
evaluated.
Accuracy Assessment
One way to access accuracy is to go out in the field and observe the actual land
class at a sample of locations, and compare to the land classification it was
assigned on the thematic map.
There are a number of ways to quantitatively express the amount of agreement
between ground truth classes and the remote sensing classes.
One way is to construct a confusion matrix, alternatively called a error matrix
This is a row by column table, with as many rows as columns.
Each row of the table is reserved for one of the information, or remote sensing
classes used by the classification algorithm.
Each column displays the corresponding ground truth classes in an identical
order. Ground truth classes No. classified
A B C pixels
Thematic map
classes
A 35 2 2 39
B 10 37 3 50
C 5 1 41 47
No. ground truth
pixels
50 40 46 136
Accuracy Assessment
Comparison to two sources of information
- Remote Sensing derived classification map
- Reference Test information
The relationship between the two sets of information’s is expressed as a matrix
known as Error Matrix / Confusion Matrix/ Contingency Table
Classified Image Reference Data
Overall Accuracy
The diagonal elements totally the number of pixels classified
correctly in each class.
An overall measure of classification accuracy is –
Total number of correct classifications
total number of classifications
which in this example amounts to (35+37+41)/136 or 83%.
But, just because 83% classifications were accurate overall,
does not mean that each category was successfully classified
at that rate.
Users Accuracy
A user of the imagery who is particularly interested in class A, say,
might wish to know what proportion of pixels assigned to class A were
correctly assigned.
In this example, 35 of the 39 pixels were correctly assigned to class A,
and the user accuracy in this category of 35/39 = 90%.
Ground truth classes No. classified
A B C pixels
Thematic map classes A 35 2 2 39
B 10 37 3 50
C 5 1 41 47
No. ground truth
pixels
50 40 46 136
Number of diagonal cell of error matrix
Number in row total
Producers Accuracy
Contrasted to user accuracy is producer accuracy, which has a
slightly different interpretation.
Producers accuracy is a measure of how much of the land in
each category was classified correctly.
It is found, for each class or category, as
Ground truth classes No. classified
A B C pixels
Thematic map classes A 35 2 2 39
B 10 37 3 50
C 5 1 41 47
No. ground truth
pixels
50 40 46 136
Number of diagonal cell of error matrix
Number in row total
• The producer’s accuracy for class A is 35/50 = 70 %
So from this assessment, there are three measures of accuracy
which address subtly different issues:
Overall accuracy: takes no account of source of error ( errors of
omission or commission)
User accuracy: measures the proportion of each TM class
which is correct.
Producer accuracy: measures the proportion of the land base
which is correctly classified.
KAPPA Coefficient
Another measure of map accuracy is the kappa coefficient, which is a
measure of the proportional (or percentage) improvement by the classifier
over a purely random assignment to classes.
For an error matrix with r rows, and hence, the same number of columns, let
A = the sum of r diagonal elements, which is the numerator in the
computation of overall accuracy. Let B= sum of the r products (row total x
column total).
Kappa hat = (NA – B) / N2 - B
Land Use/Land Cover
Land cover data documents how much of a region is covered by
forests, wetlands, impervious surfaces, agriculture, and other land
and water types. Water types include wetlands or open water. Land
use shows how people use the landscape – whether for
development, conservation, or mixed uses. The different types of
land cover can be managed or used quite differently.
Land cover can be determined by analyzing satellite and aerial
imagery. Land use cannot be determined from satellite imagery.
Land cover maps provide information to help managers best
understand the current landscape. To see change over time, land
cover maps for several different years are needed. With this
information, managers can evaluate past management decisions as
well as gain insight into the possible effects of their current
decisions before they are implemented.
Thematic Maps
Thematic maps are single-topic maps that focus on specific
themes or phenomena, such as population density, rainfall and
precipitation levels, vegetation distribution, and poverty.
This differs from reference maps which include a number of
different elements like roads, topography, and political
boundaries.
