SlideShare a Scribd company logo
1 of 64
Download to read offline
1
TRAINING REPORT
OF
SIX MONTHS INDUSTRIAL TRAINING,
UNDERTAKEN
AT
DEFENCE RESEARCH AND DEVELOPMENT
ORGANIZATION
IN
DEFENCE TERRAIN RESEARCH LABORATORY
ON
MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR
CLUSTERING OF REMOTE SENSING DATA
SUBMITTED IN PARTIAL FULFILLMENT OF THE DEGREE
OF
B.E. (ECE)
Under the Guidance of: Submitted By:
Name: T S RAWAT Name: DAKSH RAJ CHOPRA
Designation: SCIENTIST ‘E’ University ID No.: B120020081
Department: DEFENCE TERRAIN RESEARCH LABORATORY
Chitkara University, Himachal Pradesh, India.
2
ACKNOWLEDGEMENT
I take this opportunity to express my profound gratitude and deep regards to my guide Sc. T S
Rawat for his exemplary guidance, monitoring and constant encouragement throughout the
course of this thesis. The blessing, help and guidance given by him time to time shall carry me
a long way in the journey of life on which I am about to embark.
I also take this opportunity to express a deep sense of gratitude to Sc. Sujata Das, DTRL for
her cordial support, valuable information and guidance, which helped me in completing this
task through various stages.
I am obliged to staff members of DTRL for the valuable information provided by them in their
respective fields. I am grateful for their cooperation during the period of my assignment.
Lastly, I thank almighty, my parents, brother, sisters and friends for their constant
encouragement without which this assignment would not be possible.
3
PREFACE
There is a famous saying “The theory without practical is lame and practical
without theory is blind.”
This project is the result of one semester training at Defence Terrain Research
Laboratory. This internship is an integral part of “Engineering” course and it aims
at providing a first-hand experience of industry to students. The following
practical experience helps the students to view the industrial work and knowledge
closely. I was really fortunate of getting an opportunity to pursue my Summer
Training in reputed, well established, fast growing and professionally managed
organization like Defence Terrain Research Laboratory assigned “MATLAB
IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR
CLUSTERING OF REMOTE SENSING DATA.”
4
CONTENTS
Sr. No. Topic Page No.
1. Project Undertaken 5
2. Introduction 6
3. Requirement Analysis with Illustrations 10
4. SOM Algorithm 16
5. Experiments during training 20
6. Additional feature of SOM 62
7. Future Enhancement 64
8. Bibliography and References 64
5
1. Project Undertaken
My internship was based on a single objective of Self Organizing Maps (SOM). Now first thing
that comes in our mind is that why this topic. Why do we need SOM and what does it do. All
that will be told in this report. There existed a projection known as Sammon Projection which
showed image of data items having same attributes closer to each other and different one having
more distance. But it could not convert high dimensional to low dimensional image. For that
we needed SOM which work on machine learning process from the neurons (attributes)
provided at the input.
The Self-Organizing Map represents a set of high-dimensional data items as a quantized two
dimensional image in an orderly fashion. Every data item is mapped into one point (node) in
the map, and the distances of the items in the map reflect similarities between the items.
SOM used a data mining technique, that it works on the principle of pattern recognize. It uses
the idea of adaptive networks that started the Artificial Intelligence research. This further helps
in the data analysis. For example while studying 17 different elements, we used to differentiate
of the basis of physical properties but SOM learned the values and made a pattern. After that it
did a clustering of that 17 elements.
The Self-Organizing Map (SOM) is a data-analysis method that visualizes similarity relations
in a set of data items. For instance in economy, it has been applied to the comparison of
enterprises at different levels of abstraction, to assess their relative financial conditions, and to
profile their products and customers.
Next comes the industrial applications of this project. In industry, the monitoring of processes,
systems and machineries by the SOM method has been a very important application, and there
the purpose is to describe the masses of different input states by ordered clusters of typical
states. In science and technology at large, there exist unlimited tasks where the research objects
must be classified on the basis of their inherent properties, to mention the classification of
proteins, genetic sequences and galaxies.
Furthermore applications of SOM are:-
1. Statistical methods at large
(a) Exploratory data analysis
(b) Statistical analysis and organization of texts
2. Industrial analyses, control, and telecommunications
3. Biomedical analyses and applications at large
4. Financial applications
6
2. Introduction
2.1 SOM Representation
Only in some special cases the relation of input items with their projection images is
one-to-one in the SOM. More often, especially in industrial and scientific applications,
the mapping is many-to-one: i.e., the projection images on the SOM are local averages
of the input-data distribution, comparable to the k-means averages in classical vector
quantization. In the VQ, the local averages are represented by a finite set of codebook
vectors. The SOM also uses a finite set of “codebook vectors”, called the models, for
the representation of local averages. An input vector is mapped into a particular node
on the SOM array by comparing it with all of the models, and the best-matching model,
called the winner, is identified, like in VQ. The most essential difference with respect
to the k-means clustering, however, is that the models of the SOM also reflect
topographic relations between the projection images which are similar to those of the
source data. So the SOM is actually a data compression method, which represents the
topographic relations of the data space by a finite set of models on a topographic map,
in an orderly fashion.
In the standard-of-living diagram shown in Fig. 2.1, unique input items are mapped on
unique locations on the SOM. However, in the majority of applications, there are
usually many statistically distributed variations of the input items, and the projection
image that is formed on the SOM then represents clusters of the variations. We shall
make this fact more clearly in examples. So, in its genuine form the SOM differs from
all of the other nonlinearly projecting methods, because it usually represents a big data
set by a much smaller number of models, sometimes also called ”weight vectors” (this
latter term comes from the theory of artificial neural networks), arranged as a
rectangular array of nodes. Each model has the same number of parameters as the
number of features in the input items. However, an SOM model may not be a replica
of any input item but only a local average over a subset of items that are most similar
to it. In this sense the SOM works like the k-means clustering algorithm, but in addition,
in a special learning process, the SOM also arranges the k means into a topographic
order according to their similarity relations. The parameters of the models are variable
and they are adjusted by learning such that, in relation to the original items, the
similarity relations of the models finally approximate or represent the similarity
relations of the original items. It is obvious that an insightful view of the complete data
base can then be obtained at one glance.
7
Fig. 2.1. Structured diagram of the data set chosen to describe the standard of living in 126 countries of the world in the year 1992. The
abbreviated country symbols are concentrated onto locations in the (quantized) display computed by the SOM algorithm. The symbols written
in capital letters correspond to those 78 countries for which at least 28 indicators out of 39 were given, and they were used in the real
computation of the SOM. The symbols written in low case letters correspond to countries for which more than 11 indicator values were
missing, and these countries are projected to locations based on the incomplete comparison of their given attributes with those of the 78
countries.
2.2 SOM Display
It shall be emphasized that unlike in the other projective methods, in the SOM the
representations of the items are not moved anywhere in their “topographic” map for their
ordering. Instead, the adjustable parameters of the models are associated with fixed locations
of the map once and for all, namely, with the nodes of a regular, usually two-dimensional array
(Fig. 2). A hexagonal array, like the pixels on a TV screen, provides the best visualization.
Initially the parameters of the models can even have random values. The correct final values
of the models or “weight vectors” will develop gradually by learning. The representations, i.e.,
the models, become more or less exact replica of the input items when their sets of feature
parameters are tuned towards the input items during learning. The SOM algorithm constructs
the models such that: After learning, more similar models will be associated with nodes that
are closer in the array, whereas less similar models will be situated gradually farther away in
the array.
It may be easier to understand the rather involved learning principles and mathematics of the
SOM, if the central idea is first expressed in the following simple illustrative form. Let X denote
a general input item, which is broadcast to all nodes for its concurrent comparison with all of
the models. Every input data item shall select the model that matches best with the input item,
and this model, called the winner, as well as a subset of its spatial neighbors in the array, shall
be modified for better matching. Like in the k-means clustering, the modification is
concentrated on a selected node that contains the winner model. On the other hand, since a
whole spatial neighbourhood around the winner in the array is modified at a time, the degree
of local, differential ordering of the models in this neighbourhood, due to a smoothing action,
8
will be increased. The successive, different inputs cause corrections in different subsets of
models. The local ordering actions will gradually be propagated over the array.
2.3 SOM – A Neural Model
Many principles in computer science have started as models of neural networks. The first
computers were nicknamed “giant brains,” and the electronic logic circuits used in the first
computers, as contrasted with the earlier electromechanical relay-logic (switching) networks,
were essentially nothing but networks of threshold triggers, believed to imitate the alleged
operation of the neural cells.
The first useful neural-network models were adaptive threshold-logic circuits, in which the
signals were weighted by adaptive (”learning”) coefficients. A significant new aspect
introduced in the 1960s was to consider collective effects in distributed adaptive networks,
which materialized in new distributed associative memory models, multilayer signal-
transforming and pattern-classifying circuits, and networks with massive feedbacks and stable
eigenstates, which solved certain optimization problems.
Against this background, the Self-Organizing Map (SOM) introduced around 1981-82 may be
seen as a model of certain cognitive functions, namely, a network model that is able to create
organized representations of sensory experiences, like the brain maps in the cerebral cortex
and other parts of the central nervous system do. In the first place the SOM gave some hints of
how the brain maps could be formed postnatally, without any genetic control. The first
demonstrations of the SOM exemplified the adaptive formation of sensory maps in the brain,
and stipulated what functional properties are most essential to their formation.
In the early 1970s there were big steps made in pattern recognition (PR) techniques. They
continued the idea of adaptive networks that started the “artificial intelligence (AI)” research.
However, after the introduction of large time-shared computer systems, a lot of computer
scientists took a new course in the AI research, developing complex decision rules by ”heuristic
programming”, by which it became possible to implement, e.g., expert systems, computerized
control of large projects, etc. However, these rules were mainly designed manually.
Nonetheless there was a group of computer scientists who were not happy with this approach:
they wanted to continue the original ideas, and to develop computational methods for new
analytical tasks in information science, such as remote sensing, image analysis in medicine,
and speech recognition. This kind of Pattern Recognition was based on mathematical statistics,
and with the advent of new powerful computers, it too could be applied to large and important
problems.
Notwithstanding the connection between AI and PR research broke in the 1970ies, and the AI
and PR conferences and societies started to operate separately. Although the Self-Organizing
Map research was started in the neural-networks context, its applications were actually
developed in experimental pattern-recognition research, which was using real data. It was first
found promising in speech recognition, but very soon numerous other applications were found
in industry, finance, and science. The SOM is nowadays regarded as a general data-analysis
method in a number of fields.
Data mining has a special flavor in data analysis. When a new research topic is started, one
usually has little understanding of the collected data. With time it happens that new, unexpected
results or phenomena will be found. The meaning often given to automated data mining is that
9
the method is able to discover new, unexpected and surprising results. Even in this book I have
tried to collect simple experiments, in which something quite unexpected will show up.
Consider, for instance, Sec. 12 (”The SOM of some metallic elements”) in which we find that
the ferromagnetic metals are mapped to a tight cluster; this result was not expected, but the
data analysis suggested that the nonmagnetic properties of the metals must have a very strong
correlation with the magnetic ones! Or in Sec. 19 (”Two-class separation of mushrooms on the
basis of visible attributes”) we are clustering the mushrooms on the basis of their visible
attributes only, but this clustering results in a dichotomy of edible vs. poisonous species. In
Sec. 13 we are organizing color vectors by the SOM, and if we use a special scaling of the color
components, we obtain a color map that coincides with the chromaticity map of human color
vision, although this result was in no way expected. Accordingly, it may be safe to say that the
SOM is a genuine data mining method, and it will find its most interesting applications in new
areas of science and technology. I may still mention a couple of other works from real life. In
1997, Naim et al. published a work [61] that clustered middle-distance galaxies according to
their morphology, and found a classification that they characterized as a new finding, compared
with the old standard classification performed by Hubble. In Finland, the pulp mill industries
were believed to represent the state of the art, but a group of our laboratory was conducting
very careful studies of the process by the SOM method [1], and the process experts payed
attention to certain instabilities in the flow of pulp through a continuous digester. This
instability was not known before, and there was sufficient reason to change the internal
structures of the digester so that the instability disappeared.
It seems possible that the SOM will open new vistas into the information science. Not only
does it already have numerous spin-offs in applications, but its role in the theory of cognition
is intriguing. However, its mathematics is still in its infancy and offers new problems especially
for mathematical statistics. A lot of high-level research is going on in this area. Maybe it is not
exaggerated to assert that the SOM presents a new information processing paradigm, or at least
a philosophical line in bioinformatics.
To recapitulate, the SOM is a clustering method, but unlike the usual clustering methods, it is
also a topography-preserving nonlinear projection mapping. On the other hand, while the
other nonlinear projection mappings are also topography-preserving mappings, they do not
average data, like the SOM does.
10
3. Requirement Analysis with Illustrations
3.1 Basic Illustration
Now in this training our job was to do the clustering of data using Self Organising Map tool.
So for that we used the MATLAB software in which SOM ToolBox was used. This tool has
various in-build functions for the clustering. But to use a tool we need to first understand the
tool and then give the required input. Let me give an example, a small code, to give you some
idea of how this tool helps in clustering of data in MATLAB.
In this example we have a data of 17 elements having 12 different properties and this whole
data was given to the SOM Tool.
close all
clear all
clc
X = [2.7 23.8 1.34 5105 .7 187 .22 658 92.4 2450 2800 2.72
6.68 10.8 2.7 3400 .79 16 .051 630.5 38.9 1380 300 39.8
10.5 18.8 .99 2700 .8 357 .056 960.5 25 1930 520 1.58
22.42 6.59 .27 4900 5.2 51.5 .032 2454 28 4800 930 5.3
8.64 31.6 2.25 2300 .51 84 .08 320.5 13 767 240 7.25
8.8 12.6 .54 4720 2.08 60 .1 1489 67 3000 1550 6.8
19.3 14.3 .58 2080 .81 268 .031 1063 15.7 2710 420 2.21
8.92 16.2 .73 3900 1.2 338 .0928 1083 50 2360 1110 1.72
11.34 28.9 2.37 1320 .17 30 .03 327 5.9 1750 220 20.7
1.734 25.5 2.95 4600 .45 145 .25 650 46.5 1097 1350 4.3
8.9 12.9 .53 4970 2.03 53 .108 1450 63 3075 1480 7.35
12.16 11.04 .53 3000 1.15 60 .059 1555 36 2200 950 10.75
21.45 15.23 .36 2690 1.6 60 .032 1773 27 4300 600 10.5
7.86 12.3 .59 5100 2.2 50 .11 1530 66 2500 1520 9.9
7.14 17.1 1.69 3700 .43 97 .092 419.5 26.8 907 430 5.95
7.3 27 1.88 2600 .55 57 .05 231.9 14.2 2270 620 11.3
9.8 13.4 2.92 1800 .32 8.6 .029 271.3 14.1 1490 200 118];
for i = 1:12
mi = min(X(:,i));
ma = max(X(:,i));
X(:,i) = (X(:,i)-mi)/(ma - mi);
end
msize = [6 6];
lattice = 'hexa'; % hexagonal lattice
neigh = 'gaussian'; % neighborhood function
radius_coarse = [4 .5]; % [initial final]
trainlen_coarse = 50; % cycles in coarse training
radius_fine = [.5 .5]; % [initial final]
trainlen_fine = 10; % cycles in fine training
smI = som_lininit(X, 'msize', msize, 'lattice', lattice, 'shape', ...
'sheet');
smC = som_batchtrain(smI, X, 'radius', radius_coarse, 'trainlen', ...
11
trainlen_coarse, 'neigh', neigh);
som_cplane('hexa',msize, 'none')
sm = som_batchtrain(smC, X, 'radius', radius_fine, 'trainlen', ...
trainlen_fine, 'neigh', neigh);
labels1 = 'ASAICCACPMNPPFZSB';
labels2 = 'lbgrdouubgidtenni';
M = sm.codebook;
norms2 = sum(M.*M,2);
op_mat=zeros(17,2);
t=1; % input serial no.
for u=1:17
X1 = X(u,:)';
Y = norms2 - 2*M*X1;
[C,c] = min(Y);
op_mat(u,1)=t;
op_mat(u,2)=c;
ch = mod(c-1,6) + 1;
cv = floor((c-1)/6) + 1;
if mod(cv,2) == 1
shift1 = -.15;
else
shift1 = .35;
end
if u==9 || u==11
shift2 = -.3;
elseif u==5 || u==14
shift2 = .3;
else
shift2 = 0;
end
% text(ch+shift1,cv+shift2,[labels1(u) ...
% labels2(u)], 'FontSize',15);
text(ch+shift1,cv+shift2,[labels1(u) ...
labels2(u)], 'FontSize',15);
t=t+1;
end
Now, if 17 different elements are given from the periodic table then it should be arranged in
17 different clusters. But the result gave a surprise to all of us. There were some elements in
single clusters telling us that there was something similar about those elements.
12
Have a look at the output:-
Fig 3.1 This is the cluster formed after the SOM tool worked with the input given to it. Some of the
elements were present in the same clusters due to similar properties.
You can see that that Nickel, Cobalt and Iron are in the same clusters since they have same
ferromagnetic properties. Also Lead and Cadmium are in the same cluster.
3.2 Commands in MATLAB
Now in the above code there are some commands used in MATLAB which help in the
clustering of this data. Few of them are:-
 Min - this finds the minimum value in the vector/matrix.
 Max - this finds the minimum value in the vector/matrix.
 som_lininit - this initializes the map after calculating the Eigen vectors.
 som_batchtrain - trains the given som with the given training data.
 som_cplane - this visualizes a 2D plane or U matrix.
 sum - adds all the array elements.
 Mod - gives the remainder.
 Floor - rounds off the number to the lower value.
 Text - creates text object at the axes.
There is another thing in the tool that is the neighbourhood which we can change according to
our need and data input. SOM works on the basis of neurons. When weights of neurons change
accordingly our output changes. Let me show this with the help of figures:-
13
Fig 3.2 The red dots are the neurons and three black dots on the either sides are the input sequence.
Fig 3.3 When the upper sequence is selected then the neurons move to that side by calculating the
distance and they move to sequence have the shorter distance. And accordingly the weight of neurons
is changed.
14
Fig 3.4 Now the other sequence was selected and the neurons changed it weights.
So every time an input sequence is selected and the weights of neurons gets changed. At the
end we have the final weight of neuron. Now what is this weight of neuron. This weight is the
machine learning process of the machine. We give them data and there is a neuron with them
which learns the pattern and do the clustering accordingly. This is the basic principle of
artificial intelligence.
3.3 SOM Array Size
Maybe it is necessary to state first that the SOM is visualizing the the entire input-data space,
whereupon its density function ought to become clearly visible.
The SOM is a quantizing method. Assume next that we have enough statistical data items to
visualize the clustering structures of the data space with sufficient accuracy. Then it should be
realized that the SOM is a quantizing method, and has a limited spatial resolution to show the
details of the clusters. Sometimes the data set may contain only few clusters, whereupon a
coarse resolution is sufficient. However, if one suspects that there are interesting fine structures
in the data, then a larger array would be needed for sufficient resolution.
Histograms can be displayed on the SOM array. However, it is also necessary to realize that
the SOM can be used to represent a histogram. The number of input data items that is mapped
onto a node is displayed as a shade of gray, or by a pseudo-color. The statistical accuracy of
such a histogram depends on how many input items are mapped per node on the average . A
very coarse rule-of-thumb may be that about 50 input-data items per node on the average
should be sufficient, otherwise the resolution is limited by the sparsity of data. So, in
visualizing clusters, a compromise must be made between resolution and statistical accuracy.
These aspects should be taken into account especially in statistical studies, where only a limited
number of samples are available.
15
Sizing the SOM by a trial-and-error method. It is not possible to estimate or even guess the
exact size of the array beforehand. It must be determined by the trial-and-error method, after
seeing the quality of the first guess. One may have to test several sizes of the SOM to check
that the cluster structures are shown with a sufficient resolution and statistical accuracy.
Typical SOM arrays range from a few dozen to a few hundred nodes. In special problems, such
as the mapping of documents onto the SOM array, even larger maps with, say, thousands of
nodes, are used. The largest map produced by us has been the SOM of seven million patent
abstracts, for which we constructed a one-million-node SOM.
On the other hand, the SOM may be at its best in the visualization of industrial processes, where
unlimited amounts of measurements can be recorded. Then the size of the SOM array is not
limited by the statistical accuracy but by the computational resources, especially if the SOM
has to be constructed periodically in real time, like in the control rooms of factories.
3.4 SOM Array Shape
Because the SOM is trying to represent the distribution of high-dimensional data items by a
two-dimensional projection image, it may be understandable that the scales of the horizontal
and vertical directions of the SOM array should approximately comply with the extensions of
the input-data distribution in the two principal dimensions, namely, those two orthogonal
directions in which the variances of the data are largest. In complete SOM software packages
there is usually an auxiliary function that makes a traditional two-dimensional image of a high-
dimensional distribution, e.g., the Sammon projection (cf., e.g. [39] in our SOM Toolbox
program package. From its main extensions one can estimate visually what the approximate
ratio of the horizontal and vertical sides of the
SOM array should be.
Special shapes of the array. There exist SOMs in which the array has not been selected as a
rectangular sheet. Its topology may resemble, e.g., a cylinder, torus, or a sphere (cf., e.g., [80]).
There also exist special SOMs in which the structure and number of nodes of the array is
determined dynamically, depending on the received data; cf., e.g., [18]. The special topologies,
although requiring more cumbersome displays, may sometimes be justified, e.g., for the
following reasons. 1. The SOM is sometimes used to define the control conditions in industrial
processes or machineries automatically, directly controlling the actuators. A problem may
occur with the boundaries of the SOM sheet: there are distortions and discontinuities, which
affect the control stability. The toroidal topology seems to solve this problem, because there
are then no boundaries in the SOM. A similar effect is obtained by the spherical topology of
the SOM. (Cf. Subsections.4.5 and 5.3, however.) 2. There may exist data, which are cyclic by
their nature. One may think, for example of the application of the SOM in musicology, where
the degrees of the scales repeat by octaves. Either the cylindrical or toroidal topology will then
map the tones cyclically onto the SOM. The dynamical topology, which adjusts itself to
structured data, is very interesting in itself. There is one particular problem, however: one must
be able to define the condition on which a new structure (branching or cutting of the SOM
network) is due. There do not exist universal conditions of this type, and any numerical limit
can only be defined arbitrarily. Accordingly, the generated structure is then not unique. This
same problem is encountered in other neural network models.
16
4. SOM Algorithm
The first SOMs were constructed by a stepwise-recursive learning algorithm, where, at each
step, a selected patch of models in the SOM array was tuned towards the given input item, one
at a time. Consider again Fig. 2. Let the input data items X this time represent a sequence {x(t)}
of real n-dimensional Euclidean vectors x, where t, an integer, signifies a step in the sequence.
Let theMi, being variable, successively attain the values of another sequence {mi(t)} of n-
dimensional real vectors that represent the successively computed approximations of model
mi. Here i is the spatial index of the node with which mi is associated. The original SOM
algorithm assumes that the following process converges and produces the wanted ordered
values for the models:
mi(t + 1) = mi(t) + hci(t)[x(t) −mi(t)] , (1)
where hci(t) is called the neighborhood function. The neighborhood function has the most
central role in self organization. This function resembles the kernel that is applied in usual
smoothing processes. However, in the SOM, the subscript c is the index of a particular node
(winner) in the array, namely, the one with the model mc(t) that has the smallest Euclidean
distance from x(t):
c = argmini {||x(t) −mi(t)||} . (2)
Equations (1) and (2) can be illustrated as defining a recursive step where first the input data
item x(t) defines or selects the best-matching model (winner) according to Eq.(2). Then,
according to Eq.(1), the model at this node as well as at its spatial neighbors in the array are
modified. The modifications always take place in such a direction that the modified models
will match better with the input. The rates of the modifications at different nodes depend on
the mathematical form of the function hci(t). A much-applied choice for the neighborhood
function hci(t) is
hci(t) = α(t) exp[−sqdist(c, i)/2σ2(t)] , (3)
where α(t) < 1 is a monotonically (e.g., hyperbolically, exponentially, or piecewise linearly)
decreasing scalar function of t, sqdist(c, i) is the square of the geometric distance between the
nodes c and i in the array, and σ(t) is another monotonically decreasing function of t,
respectively. The true mathematical form of σ(t) is not crucial, as long as its value is fairly large
in the beginning of the process, say, on the order of 20 per cent of the longer side of the SOM
array, after which it is gradually reduced to a small fraction of it, usually in a few thousand
steps. The topographic order is developed during this period. On the other hand, after this initial
phase of coarse ordering, the final convergence to nearly optimal values of the models takes
place, say, in an order of magnitude more steps, whereupon α(t) attains values on the order of
.01. For a sufficient statistical accuracy, every model must be updated sufficiently often.
However, we must give a warning: the final value of σ shall never go to zero, because otherwise
the process loses its ordering power. It should always remain, say, above half of the array
spacing. In very large SOM arrays, the final value of σ may be on the order of five per cent of
the shorter side of the array. There are also other possible choices for the mathematical form
of hci(t). One of them, the ”bubble” form, is very simple; in it we have hci = 1 up to a
certain radius from the winner, and zero otherwise.
17
4.1 Stable state of the learning process
In the stationary state of learning, every model is the average of input items projected into its
neighborhood, weighted by the neighborhood function.
Assuming that the convergence to some stable state of the SOM is true, we require that the
expectation values of mi(t + 1) and mi(t) for t→∞ must be equal, while hci is nonzero, where
c = c(x(t)) is the index of the winner node for input x(t). In other words we must have
∀i, Et{hci(x(t) −mi(t))} = 0 . (4)
Here Et is the mathematical expectation value operator over t. In the assumed asymptotic state,
for t → ∞, the mi(t) are independent of t and are denoted by m∗ i . If the expectation values
Et(.) are written, for t → ∞, as (1/t)_t(.), we can write
m∗i = _t hci(t)x(t) _t hci(t). (5)
This, however, is still an implicit expression, since c depends on x(t) and the mi, and must be
solved iteratively. Nonetheless, Eq.(5) shall be used for the motivation of the iterative solution
for the mi, known as the batch computation of the SOM (”Batch Map”).
4.2 Initialization of the models
The learning process can be started with random vectors as the initial values of the model
vectors, but learning is sped up significantly, if certain regular initial values are given to the
models.
A special question concerns the selection of the initial values for the mi. It has been
demonstrated by [39] that they can be selected even as random vectors, but a significantly
faster convergence follows if the initial values constitute a regular, two-dimensional sequence
of vectors taken along a hyperplane spanned by the two largest principal components of x (i.e.,
the principal components associated with the two highest eigenvalues); cf. [39]. This method
is called linear initialization.
The initialization of the models as random vectors was originally used only to demonstrate the
capability of the SOM to become ordered, starting from an arbitrary initial state. In practical
applications one expects to achieve the final ordering as quickly as possible, so the selection
of a good initial state may speed up the convergence of the algorithms by orders of magnitude.
4.3 Point density of the models (one-dimensional case)
It was stated in Subsec. 4.1 that in the stationary state of learning, every model vector is the
average of input items projected into its neighborhood, weighted by the neighborhood function.
However, this condition does not yet tell anything about the distribution of the model vectors,
or their point density.
18
To clarify what is thereby meant, we have to revert to the classical vector quantization, or the
k-means algorithm [19], [20], which differs from the SOM in that only the winners are updated
in training; in other words, the ”neighborhood function” hci in k-means learning is equal to
δci, where δci = 1, if c = i, and δci = 0, if c _= i.
No topographic order of the models is produced in the classical vector quantization, but its
mathematical theory is well established. In particular, it has been shown that the point density
q(x) of its model vectors depends on the probability density function p(x) of the input vectors
such that (in the Euclidean metric) q(x) = C · p(x)1/3 , where C is a scalar constant. No similar
result has been derived for general vector dimensionalities in the SOM. In the case that (i) when
the input items are scalar-valued, (ii) when the SOM array is linear, i.e., a one-dimensional
chain, (iii) when the neighborhood function is a box function with N neighbors on each side of
the winner, and (iv) if the SOM contains a very large number of model vectors over a finite
range, Ritter and Schulten [74] have derived the following formula, where C is some constant:
q(x) = C · p(x)r , where
r = 2/3 − 1/(32 + 3(N + 1)2) .
For instance, when N = 1 (one neighbor on each side of the winner), we have r = 0.60.
For Gaussian neighborhood functions, Dersch and Tavan have derived a similar result [15].
In other words, the point density of the models in the display is usually proportional to the
probability density function of the inputs, but not linearly: it is flatter. This phenomenon is not
harmful; as a matter it means that the SOM display is more uniform than it would be if it
represented the exact probability density of the inputs.
4.4 Border effects of the SOM
Another salient effect in the SOM, namely, in the planar sheet topology of the array, is that the
point density of the models at the borders of the array is a little distorted. This effect is
illustrated in the case in which the probability density of input is constant in a certain domain
(support of p(x)) and zero outside it. Consider again first the one-dimensional input and linear
SOM array.
In Fig. 4.1 we show a one-dimensional domain (support) [a, b], over which the probability
density function p(x) of a scalar-valued input variable x is constant. The inputs to the SOM
algorithm are picked up from this support at random. The set of ordered scalar-valued models
μi of the resulting one-dimensional SOM has been rendered on the x axis.
Fig. 4.1 Ordered model values μi over a one-dimensional domain.
19
Fig. 4.2. Converged model values μi over a one-dimensional domain of different lengths.
Numerical values of the μi for different lengths of the SOM array are shown in Fig.4.2. It is
discernible that the μi in the middle of the array are reasonably evenly spaced, but close to the
borders the first distance from the border is bigger, the second spacing is smaller, and the next
spacing is again larger than the average. This effect is explained by Fig. 3: in equilibrium, in
the middle of the array, every μi must coincide with the centroid of set Si, where Si represents
all values of x that will be mapped to node i or its neighbors i − 1 and i + 1. However, at the
borders, all values of x that are mapped to node 1 will be in the interval ranging from a to (μ2
+ μ3)/2, and those values of x that will be mapped to node 2 range from a to (μ3 + μ4)/2. Similar
relations can be shown to hold near the end of the chain for μl−1 and μl. Clearly the definition
of the above intervals is unsymmetric in the middle of the array and at the borders, which makes
the spacings different.
20
5. Experiments during training
5.1 Basic Code of Clustering of Random Colours
Now with the baby steps first a code was tried in which there were 10,000 colour vectors. They
were all randomly generated. After that the input was given to the SOM tool. Now there are
two vector values that we have to give so that the SOM can calculate the distance and keep it
in the required cluster. These are radius_coarse and radius_fine. In radius_coarse, we given the
radius from where it will start to calculate and check the neighbours by similar properties and
put it in the desired cluster. Then we have to give the radius_fine in which it has the final radius
values. So in all we have given the bigger circle to check neighbours and then reduce it to find
neighbours. All this is given by the user according to the needs.
close
clear all
clc
msize = [25 25];
lattice = 'rect';
radius_coarse = [10 7];
radius_fine = [7 5];
trainlen_coarse = 50;
trainlen_fine = 50;
X = rand(10000,3); % random input (training) vectors to the SOM
smI = som_lininit(X,'msize',msize,'lattice',lattice,'shape', ...
'sheet');
smC = som_batchtrain(smI,X,'radius',radius_coarse,'trainlen', ...
trainlen_coarse, 'neigh','gaussian');
sm = som_batchtrain(smC,X,'radius',radius_fine,'trainlen', ...
trainlen_fine, 'neigh','gaussian');
M = sm.codebook;
C = zeros(25,25,3);
for i = 1:25
for j = 1:25
p = 25*(i-1) + j;
C(i,j,1) = M(p,1); C(i,j,2) = M(p,2); C(i,j,3) = M(p,3);
end
end
image(C)
21
Fig 5.1. In this the colours of same shade are near to each other and clusters are made accordingly.
5.2 Hyperspectral Analysis of Soil
In this we had a hyperspectral data of 25 different soils. It had spectral values of different soils
at various wavelengths. We chose three different range of wavelengths – Visible, Infrared,
Short wave Infrared.
Aim: To take a range of spectral values and perform a SOM on it to make a spectral library.
Procedure: For identifying any random value that which soil is that we added noise in the data
of different SNR and check that which noise is giving the result equivalent to the original soil
value.
5.2.1 Soil Type – I Very Dark Grayish Brown Silty Loam
5.2.1.1 Visible Range
The visible spectrum is the portion of the electromagnetic spectrum that is visible to
the human eye. Electromagnetic radiation in this range of wavelengths is called visible light or
simply light. A typical human eye will respond to wavelengths from about 390 to 700 nm. In
terms of frequency, this corresponds to a band in the vicinity of 430–770 THz.
The spectrum does not, however, contain all the colors that the human eyes and brain can
distinguish. Unsaturated colors such as pink, or purple variations such as magenta, are absent,
for example, because they can be made only by a mix of multiple wavelengths. Colors
containing only one wavelength are also called pure colors or spectral colors.
22
Visible wavelengths pass through the "optical window", the region of the electromagnetic
spectrum that allows wavelengths to pass largely unattenuated through the Earth's atmosphere.
An example of this phenomenon is that clean air scatters blue light more than red wavelengths,
and so the midday sky appears blue. The optical window is also referred to as the "visible
window" because it overlaps the human visible response spectrum. The near infrared (NIR)
window lies just out of the human vision, as well as the Medium Wavelength IR (MWIR)
window, and the Long Wavelength or Far Infrared (LWIR or FIR) window, although other
animals may experience them.
Colors that can be produced by visible light of a narrow band of wavelengths (monochromatic
light) are called pure spectral colors. The various color ranges indicated in the illustration are
an approximation: The spectrum is continuous, with no clear boundaries between one color
and the next.
This was a small introduction about Visible range of wavelength. When choosing the visible
range we got a range of spectrum and then we experimented it with addition of white Gaussian
noise in the spectrum.
Now, What is White Gaussian Noise?
Additive white Gaussian noise (AWGN) is a basic noise model used in Information theory to
mimic the effect of many random processes that occur in nature. The modifiers denote specific
characteristics:
 Additive because it is added to any noise that might be intrinsic to the information system.
 White refers to the idea that it has uniform power across the frequency band for the
information system. It is an analogy to the color white which has uniform emissions at all
frequencies in the visible spectrum.
 Gaussian because it has a normal distribution in the time domain with an average time
domain value of zero.
Wideband noise comes from many natural sources, such as the thermal vibrations of atoms in
conductors (referred to as thermal noise or Johnson-Nyquist noise), shot noise,black body
23
radiation from the earth and other warm objects, and from celestial sources such as the Sun.
The central limit theorem of probability theory indicates that the summation of many random
processes will tend to have distribution called Gaussian or Normal.
AWGN is often used as a channel model in which the only impairment to communication is a
linear addition of wideband or white noise with a constant spectral density(expressed
as watts per hertz of bandwidth) and a Gaussian distribution of amplitude. The model does not
account for fading, frequency selectivity, interference, nonlinearity ordispersion. However, it
produces simple and tractable mathematical models which are useful for gaining insight into
the underlying behavior of a system before these other phenomena are considered.
The AWGN channel is a good model for many satellite and deep space communication links.
It is not a good model for most terrestrial links because of multipath, terrain blocking,
interference, etc. However, for terrestrial path modeling, AWGN is commonly used to simulate
background noise of the channel under study, in addition to multipath, terrain blocking,
interference, ground clutter and self interference that modern radio systems encounter in
terrestrial operation.
Code for addition of Additive White Gaussian Noise
1. For SNR 60
close all
clear all
clc
path(path,'F:TraineeDakshsoil3');
A=textread('soil400.txt');
noise=awgn(A,60)
2. For SNR 58
close all
clear all
clc
path(path,'F:TraineeDakshsoil3');
A=textread('soil400.txt');
noise=awgn(A,58)
3. For SNR 56
close all
clear all
clc
path(path,'F:TraineeDakshsoil3');
A=textread('soil400.txt');
noise=awgn(A,56)
24
4. For SNR 54 dB
close all
clear all
clc
path(path,'F:TraineeDakshsoil3');
A=textread('soil400.txt');
noise=awgn(A,54)
After adding different level of SNR, we get a plot of five different spectral values.
Fig 5.2. Different spectral values of single type of soil (Very dark grayish brown silty loam).
After finding the different spectral values of different SNR, we applied the same clustering
method on it.
Code for clustering
close all
clear all
clc
path(path,'F:TraineeDakshsoil3');
ATRANS=textread('soil400clus.txt');
X=ATRANS';
for i = 1:301
mi = min(X(:,i));
25
ma = max(X(:,i));
X(:,i) = (X(:,i)-mi)/(ma - mi);
end
msize = [4 4];
lattice = 'hexa'; % hexagonal lattice
neigh = 'gaussian'; % neighborhood function
radius_coarse = [4 .5]; % [initial final]
trainlen_coarse = 50; % cycles in coarse training
radius_fine = [.5 .5]; % [initial final]
trainlen_fine = 10; % cycles in fine training
smI = som_lininit(X, 'msize', msize, 'lattice', lattice, 'shape', ...
'sheet');
smC = som_batchtrain(smI, X, 'radius', radius_coarse, 'trainlen', ...
trainlen_coarse, 'neigh', neigh);
som_cplane('rect',msize, 'none')
sm = som_batchtrain(smC, X, 'radius', radius_fine, 'trainlen', ...
trainlen_fine, 'neigh', neigh);
labels1 = 'SSSSSSS';
labels2 = '1234567';
M = sm.codebook;
norms2 = sum(M.*M,2);
op_mat=zeros(7,2);
t=1; % input serial no.
for u=1:3
X1 = X(u,:)';
Y = norms2 - 2*M*X1;
[C,c] = min(Y);
op_mat(u,1)=t;
op_mat(u,2)=c;
ch = mod(c-1,4) + 1;
cv = floor((c-1)/4) + 1;
if mod(cv,2) == 1
shift1 = -.15;
else
shift1 = .35;
end
if u==9 || u==11
shift2 = -.3;
elseif u==5 || u==14
shift2 = .3;
else
shift2 = 0;
end
% text(ch+shift1,cv+shift2,[labels1(u) ...
% labels2(u)], 'FontSize',15);
text(ch,cv+shift2,[labels1(u) ...
labels2(u)], 'FontSize',15);
t=t+1;
end
26
Fig 5.3. Cluster for Soil Type 1
Conclusion: In the above figure, S1 is the original value and rest are the other values with
added noise. We can see that S2 and S5 are near to S1 hence having same properties.
5.2.1.2 Near Infrared
Near-infrared spectroscopy (NIRS) is a spectroscopic method that uses the near-
infrared region of the electromagnetic spectrum (from about 700 nm to 2500 nm). Typical
applications include medical and physiological diagnostics and research including blood
sugar, pulse oximetry, functional neuroimaging, sports medicine, elite sports
training, ergonomics, rehabilitation, neonatal research, brain computer
interface, urology (bladder contraction), and neurology (neurovascular coupling). There are
also applications in other areas as well such as pharmaceutical, food and agrochemical quality
control, atmospheric chemistry, and combustion research.
Near-infrared spectroscopy is based on molecular overtone and combination vibrations. Such
transitions are forbidden by the selection rules of quantum mechanics. As a result, the molar
absorptivity in the near-IR region is typically quite small. One advantage is that NIR can
typically penetrate much farther into a sample than mid infrared radiation. Near-infrared
spectroscopy is, therefore, not a particularly sensitive technique, but it can be very useful in
probing bulk material with little or no sample preparation.
The molecular overtone and combination bands seen in the near-IR are typically very broad,
leading to complex spectra; it can be difficult to assign specific features to specific chemical
components. Multivariate (multiple variables) calibration techniques (e.g., principal
27
components analysis, partial least squares, or artificial neural networks) are often employed to
extract the desired chemical information. Careful development of a set of calibration samples
and application of multivariate calibration techniques is essential for near-infrared analytical
methods.
Code for addition of Additive White Gaussian Noise
1. For SNR 60
close all
clear all
clc
path(path,'F:TraineeDakshsoil3');
A=textread('soil400.txt');
noise=awgn(A,60)
2. For SNR 58
close all
clear all
clc
path(path,'F:TraineeDakshsoil3');
A=textread('soil400.txt');
noise=awgn(A,58)
3. For SNR 56
close all
clear all
clc
path(path,'F:TraineeDakshsoil3');
A=textread('soil400.txt');
noise=awgn(A,56)
4. For SNR 54 dB
close all
clear all
clc
path(path,'F:TraineeDakshsoil3');
A=textread('soil400.txt');
noise=awgn(A,54)
After adding different level of SNR, we get a plot of five different spectral values.
28
Fig 5.4 Different spectral values of single type of soil (Very dark grayish brown silty loam).
After finding the different spectral values of different SNR, we applied the same clustering
method on it.
Code for clustering
close all
clear all
clc
path(path,'C:UsersDTRLDesktopsoil');
ATRANS=textread('noiseclus700.txt');
X=ATRANS';
for i = 1:251
mi = min(X(:,i));
ma = max(X(:,i));
X(:,i) = (X(:,i)-mi)/(ma - mi);
end
msize = [4 4];
lattice = 'hexa'; % hexagonal lattice
neigh = 'gaussian'; % neighborhood function
radius_coarse = [4 .5]; % [initial final]
trainlen_coarse = 50; % cycles in coarse training
radius_fine = [.5 .5]; % [initial final]
trainlen_fine = 10; % cycles in fine training
smI = som_lininit(X, 'msize', msize, 'lattice', lattice, 'shape', ...
'sheet');
smC = som_batchtrain(smI, X, 'radius', radius_coarse, 'trainlen', ...
29
trainlen_coarse, 'neigh', neigh);
som_cplane('rect',msize, 'none')
sm = som_batchtrain(smC, X, 'radius', radius_fine, 'trainlen', ...
trainlen_fine, 'neigh', neigh);
labels1 = 'SSSSSSS';
labels2 = '1234567';
M = sm.codebook;
norms2 = sum(M.*M,2);
op_mat=zeros(7,2);
t=1; % input serial no.
for u=1:7
X1 = X(u,:)';
Y = norms2 - 2*M*X1;
[C,c] = min(Y);
op_mat(u,1)=t;
op_mat(u,2)=c;
ch = mod(c-1,4) + 1;
cv = floor((c-1)/4) + 1;
if mod(cv,2) == 1
shift1 = -.15;
else
shift1 = .35;
end
if u==9 || u==11
shift2 = -.3;
elseif u==5 || u==14
shift2 = .3;
else
shift2 = 0;
end
% text(ch+shift1,cv+shift2,[labels1(u) ...
% labels2(u)], 'FontSize',15);
text(ch,cv+shift2,[labels1(u) ...
labels2(u)], 'FontSize',15);
t=t+1;
end
op_mat
30
Fig 5.5 Clustering for Soil Type I
Conclusion: In the above figure, S1 is the original value and rest are the other values with
added noise. We can see that S2 and S6 are near to S1 hence having same properties.
5.2.1.3 Shortwave Infrared
Short-wave infrared (SWIR) light is typically defined as light in the 0.9 – 1.7μm wavelength
range, but can also be classified from 0.7 – 2.5μm. Since silicon sensors have an upper limit of
approximately 1.0μm, SWIR imaging requires unique opticaland electronic components
capable of performing in the specific SWIR range. Indium gallium arsenide (inGaAs) sensors
are the primary sensors used in SWIR imaging, covering the typical SWIR range, but can
extend as low as 550nm to as high as 2.5μm. Although linear line-scan inGaAs sensors are
commercially available, area-scan inGaAs sensors are typically ITAR restricted. ITAR,
International Treaty and Arms Regulations, is enforced by the government of the United States
of America. ITAR restricted products must adhere to strict export and import regulations for
them to be manufactured and/or sold within and outside of the USA. Nevertheless, lenses such
as SWIR ones can be used for a number of commercial applications with proper licenses.
31
Code for addition of Additive White Gaussian Noise
1. For SNR 60
close all
clear all
clc
path(path,'F:TraineeDakshsoil3');
A=textread('soil400.txt');
noise=awgn(A,60)
2. For SNR 58
close all
clear all
clc
path(path,'F:TraineeDakshsoil3');
A=textread('soil400.txt');
noise=awgn(A,58)
3. For SNR 56
close all
clear all
clc
path(path,'F:TraineeDakshsoil3');
A=textread('soil400.txt');
noise=awgn(A,56)
4. For SNR 54 dB
close all
clear all
clc
path(path,'F:TraineeDakshsoil3');
A=textread('soil400.txt');
noise=awgn(A,54)
After adding different level of SNR, we get a plot of five different spectral values.
32
Fig 5.6 Different spectral values of single type of soil (Very dark grayish brown silty loam).
After finding the different spectral values of different SNR, we applied the same clustering
method on it.
Code for clustering
close all
clear all
clc
path(path,'C:UsersDTRLDesktopsoil2');
ATRANS=textread('soil1400clus.txt');
X=ATRANS';
for i = 1:424
mi = min(X(:,i));
ma = max(X(:,i));
X(:,i) = (X(:,i)-mi)/(ma - mi);
end
msize = [4 4];
lattice = 'hexa'; % hexagonal lattice
neigh = 'gaussian'; % neighborhood function
radius_coarse = [4 .5]; % [initial final]
trainlen_coarse = 50; % cycles in coarse training
radius_fine = [.5 .5]; % [initial final]
trainlen_fine = 10; % cycles in fine training
smI = som_lininit(X, 'msize', msize, 'lattice', lattice, 'shape', ...
'sheet');
smC = som_batchtrain(smI, X, 'radius', radius_coarse, 'trainlen', ...
trainlen_coarse, 'neigh', neigh);
33
som_cplane('rect',msize, 'none')
sm = som_batchtrain(smC, X, 'radius', radius_fine, 'trainlen', ...
trainlen_fine, 'neigh', neigh);
labels1 = 'SSSSSSS';
labels2 = '1234567';
M = sm.codebook;
norms2 = sum(M.*M,2);
op_mat=zeros(7,2);
t=1; % input serial no.
for u=1:5
X1 = X(u,:)';
Y = norms2 - 2*M*X1;
[C,c] = min(Y);
op_mat(u,1)=t;
op_mat(u,2)=c;
ch = mod(c-1,4) + 1;
cv = floor((c-1)/4) + 1;
if mod(cv,2) == 1
shift1 = -.15;
else
shift1 = .35;
end
if u==9 || u==11
shift2 = -.3;
elseif u==5 || u==14
shift2 = .3;
else
shift2 = 0;
end
% text(ch+shift1,cv+shift2,[labels1(u) ...
% labels2(u)], 'FontSize',15);
text(ch,cv+shift2,[labels1(u) ...
labels2(u)], 'FontSize',15);
t=t+1;
end
op_mat
34
Fig 5.7. Cluster for Soil Type I
Conclusion: In the above figure, S1 is the original value and rest are the other values with
added noise. We can see that S2 and S3 are near to S1 hence having same properties.
5.2.2 Soil Type – II Grayish Brown Loam
5.2.2.1 Visible Range
Code for addition of Additive White Gaussian Noise
1. For SNR 60
close all
clear all
clc
path(path,'F:TraineeDakshsoil3');
A=textread('soil400.txt');
noise=awgn(A,60)
2. For SNR 58
close all
clear all
clc
path(path,'F:TraineeDakshsoil3');
A=textread('soil400.txt');
noise=awgn(A,58)
35
3. For SNR 56
close all
clear all
clc
path(path,'F:TraineeDakshsoil3');
A=textread('soil400.txt');
noise=awgn(A,56)
4. For SNR 54 dB
close all
clear all
clc
path(path,'F:TraineeDakshsoil3');
A=textread('soil400.txt');
noise=awgn(A,54)
After adding different level of SNR, we get a plot of five different spectral values.
Fig 5.8. Different spectral values of single type of soil (Grayish brown loam).
36
After finding the different spectral values of different SNR, we applied the same clustering
method on it.
Code for clustering
close all
clear all
clc
path(path,'C:UsersDTRLDesktopsoil2');
ATRANS=textread('soil400clus.txt');
X=ATRANS';
for i = 1:301
mi = min(X(:,i));
ma = max(X(:,i));
X(:,i) = (X(:,i)-mi)/(ma - mi);
end
msize = [4 4];
lattice = 'hexa'; % hexagonal lattice
neigh = 'gaussian'; % neighborhood function
radius_coarse = [4 .5]; % [initial final]
trainlen_coarse = 50; % cycles in coarse training
radius_fine = [.5 .5]; % [initial final]
trainlen_fine = 10; % cycles in fine training
smI = som_lininit(X, 'msize', msize, 'lattice', lattice, 'shape', ...
'sheet');
smC = som_batchtrain(smI, X, 'radius', radius_coarse, 'trainlen', ...
trainlen_coarse, 'neigh', neigh);
som_cplane('rect',msize, 'none')
sm = som_batchtrain(smC, X, 'radius', radius_fine, 'trainlen', ...
trainlen_fine, 'neigh', neigh);
labels1 = 'SSSSSSS';
labels2 = '1234567';
M = sm.codebook;
norms2 = sum(M.*M,2);
op_mat=zeros(7,2);
t=1; % input serial no.
for u=1:7
X1 = X(u,:)';
Y = norms2 - 2*M*X1;
[C,c] = min(Y);
op_mat(u,1)=t;
op_mat(u,2)=c;
ch = mod(c-1,4) + 1;
cv = floor((c-1)/4) + 1;
if mod(cv,2) == 1
shift1 = -.15;
else
shift1 = .35;
end
37
if u==9 || u==11
shift2 = -.3;
elseif u==5 || u==14
shift2 = .3;
else
shift2 = 0;
end
% text(ch+shift1,cv+shift2,[labels1(u) ...
% labels2(u)], 'FontSize',15);
text(ch,cv+shift2,[labels1(u) ...
labels2(u)], 'FontSize',15);
t=t+1;
end
op_mat
Fig 5.9. Clustering for Soil Type II
Conclusion: In the above figure, S1 is the original value and rest are the other values with
added noise. We can see that none is near to S1 hence all are different.
38
5.2.2.2 Near Infrared
Code for addition of Additive White Gaussian Noise
1. For SNR 60
close all
clear all
clc
path(path,'F:TraineeDakshsoil3');
A=textread('soil400.txt');
noise=awgn(A,60)
2. For SNR 58
close all
clear all
clc
path(path,'F:TraineeDakshsoil3');
A=textread('soil400.txt');
noise=awgn(A,58)
3. For SNR 56
close all
clear all
clc
path(path,'F:TraineeDakshsoil3');
A=textread('soil400.txt');
noise=awgn(A,56)
4. For SNR 54 dB
close all
clear all
clc
path(path,'F:TraineeDakshsoil3');
A=textread('soil400.txt');
noise=awgn(A,54)
39
After adding different level of SNR, we get a plot of five different spectral values.
Fig 5.10. Different spectral values of single type of soil (Grayish brown loam).
After finding the different spectral values of different SNR, we applied the same clustering
method on it.
Code for clustering
close all
clear all
clc
path(path,'C:UsersDTRLDesktopsoil2');
ATRANS=textread('soil700clus.txt');
X=ATRANS';
for i = 1:185
mi = min(X(:,i));
ma = max(X(:,i));
X(:,i) = (X(:,i)-mi)/(ma - mi);
end
msize = [4 4];
lattice = 'hexa'; % hexagonal lattice
neigh = 'gaussian'; % neighborhood function
radius_coarse = [4 .5]; % [initial final]
trainlen_coarse = 50; % cycles in coarse training
radius_fine = [.5 .5]; % [initial final]
trainlen_fine = 10; % cycles in fine training
smI = som_lininit(X, 'msize', msize, 'lattice', lattice, 'shape', ...
'sheet');
smC = som_batchtrain(smI, X, 'radius', radius_coarse, 'trainlen', ...
trainlen_coarse, 'neigh', neigh);
40
som_cplane('rect',msize, 'none')
sm = som_batchtrain(smC, X, 'radius', radius_fine, 'trainlen', ...
trainlen_fine, 'neigh', neigh);
labels1 = 'SSSSSSS';
labels2 = '1234567';
M = sm.codebook;
norms2 = sum(M.*M,2);
op_mat=zeros(7,2);
t=1; % input serial no.
for u=1:5
X1 = X(u,:)';
Y = norms2 - 2*M*X1;
[C,c] = min(Y);
op_mat(u,1)=t;
op_mat(u,2)=c;
ch = mod(c-1,4) + 1;
cv = floor((c-1)/4) + 1;
if mod(cv,2) == 1
shift1 = -.15;
else
shift1 = .35;
end
if u==9 || u==11
shift2 = -.3;
elseif u==5 || u==14
shift2 = .3;
else
shift2 = 0;
end
% text(ch+shift1,cv+shift2,[labels1(u) ...
% labels2(u)], 'FontSize',15);
text(ch,cv+shift2,[labels1(u) ...
labels2(u)], 'FontSize',15);
t=t+1;
end
op_mat
41
Fig 5.11. Clustering for Soil Type II
Conclusion: In the above figure, S1 is the original value and rest are the other values with
added noise. We can see that S2 is near to S1 hence having same properties.
5.2.2.3 Shortwave Infrared
Code for addition of Additive White Gaussian Noise
1. For SNR 60
close all
clear all
clc
path(path,'F:TraineeDakshsoil3');
A=textread('soil400.txt');
noise=awgn(A,60)
2. For SNR 58
close all
clear all
clc
path(path,'F:TraineeDakshsoil3');
A=textread('soil400.txt');
noise=awgn(A,58)
3. For SNR 56
close all
clear all
clc
42
path(path,'F:TraineeDakshsoil3');
A=textread('soil400.txt');
noise=awgn(A,56)
4. For SNR 54 dB
close all
clear all
clc
path(path,'F:TraineeDakshsoil3');
A=textread('soil400.txt');
noise=awgn(A,54)
After adding different level of SNR, we get a plot of five different spectral values.
Fig 5.12. Different spectral values of single type of soil (Grayish brown loam).
After finding the different spectral values of different SNR, we applied the same clustering
method on it.
Code for clustering
close all
clear all
clc
path(path,'C:UsersDTRLDesktopsoil2');
ATRANS=textread('soil700clus.txt');
X=ATRANS';
for i = 1:424
mi = min(X(:,i));
ma = max(X(:,i));
X(:,i) = (X(:,i)-mi)/(ma - mi);
43
end
msize = [4 4];
lattice = 'hexa'; % hexagonal lattice
neigh = 'gaussian'; % neighborhood function
radius_coarse = [4 .5]; % [initial final]
trainlen_coarse = 50; % cycles in coarse training
radius_fine = [.5 .5]; % [initial final]
trainlen_fine = 10; % cycles in fine training
smI = som_lininit(X, 'msize', msize, 'lattice', lattice, 'shape', ...
'sheet');
smC = som_batchtrain(smI, X, 'radius', radius_coarse, 'trainlen', ...
trainlen_coarse, 'neigh', neigh);
som_cplane('rect',msize, 'none')
sm = som_batchtrain(smC, X, 'radius', radius_fine, 'trainlen', ...
trainlen_fine, 'neigh', neigh);
labels1 = 'SSSSSSS';
labels2 = '1234567';
M = sm.codebook;
norms2 = sum(M.*M,2);
op_mat=zeros(7,2);
t=1; % input serial no.
for u=1:5
X1 = X(u,:)';
Y = norms2 - 2*M*X1;
[C,c] = min(Y);
op_mat(u,1)=t;
op_mat(u,2)=c;
ch = mod(c-1,4) + 1;
cv = floor((c-1)/4) + 1;
if mod(cv,2) == 1
shift1 = -.15;
else
shift1 = .35;
end
if u==9 || u==11
shift2 = -.3;
elseif u==5 || u==14
shift2 = .3;
else
shift2 = 0;
end
% text(ch+shift1,cv+shift2,[labels1(u) ...
% labels2(u)], 'FontSize',15);
text(ch,cv+shift2,[labels1(u) ...
labels2(u)], 'FontSize',15);
t=t+1;
end
op_mat
44
Fig 5.13. Clustering for Soil Type II
Conclusion: In the above figure, S1 is the original value and rest are the other values with
added noise. We can see that none is near to S1.
Similar experiment was done with other 22 soils.
5.3 Polarimetry Analysis of Different Regions
Now in this experiment we had different types of terrain like Urban, Water and Vegetated. So
we decided to check the different terrain by calculating there Co and Cross Polarization.
Aim: First we plotted the 3D curve for all the values of co and cross in all the different regions
and see the changes in the plots. Then we did the SOM clustering and see if these values it also
tells the same thing.
Procedure: For the clustering we found the wavelets from the array with the help of Haar
wavelet at different levels. Coefficients of wavelets were found and with them the input was
given to the SOM for clustering.
Theory:
Cross – Polarization
Cross polarized wave (XPW) generation is a nonlinear optical process that can be classified
in the group of frequency degenerate [four wave mixing] processes. It can take place only in
media with anisotropy of third order nonlinearity. As a result of such nonlinear optical
interaction at the output of the nonlinear crystal it is generated a new linearly polarized wave
45
at the same frequency, but with polarization oriented perpendicularly to the polarization of
input wave
.
Fig 5.14.
Simplified optical scheme for XPW generation is shown on Fig. 5.14. It consists of a nonlinear
crystal plate (thick 1-2 mm) sandwiched between two crossed polarizers. The intensity of
generated XPW has cubic dependence with respect to the intensity of the input wave. In fact
this is the main reason this effect is so successful for improvement of the contrast of the
temporal and spatial profiles of femtosecond pulses. Since cubic crystals are used as nonlinear
media they are isotropic with respect to linear properties (there is no birefringence) and because
of that the phase and group velocities of both waves XPW and the fundamental wave(FW) are
equal:VXPW=VFW and Vgr,XPW=Vgr,FW. Consequence of that is ideal phase and group velocity
matching for the two waves propagating along the crystal. This property allows obtaining very
good efficiency of the XPW generation process with minimum distortions of the pulse shape
and the spectrum.
Wavelets
A wavelet is a wave-like oscillation with an amplitude that begins at zero, increases, and then
decreases back to zero. It can typically be visualized as a "brief oscillation" like one might see
recorded by a seismograph or heart monitor. Generally, wavelets are purposefully crafted to
have specific properties that make them useful for signal processing. Wavelets can be
combined, using a "reverse, shift, multiply and integrate" technique called convolution, with
portions of a known signal to extract information from the unknown signal.
For example, a wavelet could be created to have a frequency of Middle C and a short duration
of roughly a 32nd note. If this wavelet was to be convolved with a signal created from the
recording of a song, then the resulting signal would be useful for determining when the Middle
C note was being played in the song. Mathematically, the wavelet will correlate with the signal
if the unknown signal contains information of similar frequency. This concept of correlation is
at the core of many practical applications of wavelet theory.
46
As a mathematical tool, wavelets can be used to extract information from many different kinds
of data, including – but certainly not limited to – audio signals and images. Sets of wavelets
are generally needed to analyze data fully. A set of "complementary" wavelets will decompose
data without gaps or overlap so that the decomposition process is mathematically reversible.
Thus, sets of complementary wavelets are useful in wavelet based compression/decompression
algorithms where it is desirable to recover the original information with minimal loss.
In formal terms, this representation is a wavelet series representation of a square-integrable
function with respect to either a complete,orthonormal set of basis functions, or
an overcomplete set or frame of a vector space, for the Hilbert space of square integrable
functions.
Haar Wavelet
In mathematics, the Haar wavelet is a sequence of rescaled "square-shaped" functions which
together form a wavelet family or basis. Wavelet analysis is similar to Fourier analysis in that
it allows a target function over an interval to be represented in terms of anorthonormal basis.
The Haar sequence is now recognised as the first known wavelet basis and extensively used
as a teaching example.
Fig 5.15. Haar Wavelet
47
The Haar sequence was proposed in 1909 by Alfréd Haar.[1]
Haar used these functions to
give an example of an orthonormal system for the space of square-integrable functions on
the unit interval [0, 1]. The study of wavelets, and even the term "wavelet", did not come
until much later. As a special case of the Daubechies wavelet, the Haar wavelet is also known
as Db1.
The Haar wavelet is also the simplest possible wavelet. The technical disadvantage of the
Haar wavelet is that it is not continuous, and therefore not differentiable. This property can,
however, be an advantage for the analysis of signals with sudden transitions, such as
monitoring of tool failure in machines.[2]
The Haar wavelet's mother wavelet function can be described as
Its scaling function can be described as
5.3.1 Urban Region
First we plotted the 3D curve for all the values of co and cross in the urban region and see the
changes in the plots.
Code for 3D Plot
close all
clear all
clc
path(path,'F:Traineeurbanroidata');
X=textread('urbanroi1_co.txt');
path(path,'F:Traineeurbanroidata');
A=textread('urbanroi2_co.txt');
path(path,'F:Traineeurbanroidata');
B=textread('urbanroi3_co.txt');
path(path,'F:Traineeurbanroidata');
C=textread('urbanroi4_co.txt');
path(path,'F:Traineeurbanroidata');
D=textread('urbanroi5_co.txt');
path(path,'F:Traineeurbanroidata');
E=textread('urbanroi1_cross.txt');
path(path,'F:Traineeurbanroidata');
F=textread('urbanroi2_cross.txt');
path(path,'F:Traineeurbanroidata');
48
G=textread('urbanroi3_cross.txt');
path(path,'F:Traineeurbanroidata');
H=textread('urbanroi4_cross.txt');
path(path,'F:Traineeurbanroidata');
I=textread('urbanroi5_cross.txt');
phi_orient=(linspace(0,180,181))';
xi_ellipt=(linspace(-45,45,91))';
subplot(5,2,1)
mesh(xi_ellipt,phi_orient,X)
colorbar
axis([-45 45 0 180 0 1])
title('CoPolarisation Plot 1')
xlabel('<-----ellipt xi ------>', 'rotation', 15)
ylabel('<----orient phi ------>', 'rotation', -20)
%figure
subplot(5,2,2)
mesh(xi_ellipt,phi_orient,A)
colorbar
axis([-45 45 0 180 0 1])
title('CoPolarisation Plot 2')
xlabel('<-----ellipt xi ------>', 'rotation', 15)
ylabel('<----orient phi ------>', 'rotation', -20)
%figure
subplot(5,2,3)
mesh(xi_ellipt,phi_orient,B)
colorbar
axis([-45 45 0 180 0 1])
title('CoPolarisation Plot 3')
xlabel('<-----ellipt xi ------>', 'rotation', 15)
ylabel('<----orient phi ------>', 'rotation', -20)
%figure
subplot(5,2,4)
mesh(xi_ellipt,phi_orient,C)
colorbar
axis([-45 45 0 180 0 1])
title('CoPolarisation Plot 4')
xlabel('<-----ellipt xi ------>', 'rotation', 15)
ylabel('<----orient phi ------>', 'rotation', -20)
%figure
subplot(5,2,5)
mesh(xi_ellipt,phi_orient,D)
colorbar
axis([-45 45 0 180 0 1])
title('CoPolarisation Plot 5')
xlabel('<-----ellipt xi ------>', 'rotation', 15)
ylabel('<----orient phi ------>', 'rotation', -20)
%figure
subplot(5,2,6)
mesh(xi_ellipt,phi_orient,E)
colorbar
49
axis([-45 45 0 180 0 1])
title('CrossPolarisation Plot 1')
xlabel('<-----ellipt xi ------>', 'rotation', 15)
ylabel('<----orient phi ------>', 'rotation', -20)
%figure
subplot(5,2,7)
mesh(xi_ellipt,phi_orient,F)
colorbar
axis([-45 45 0 180 0 1])
title('CrossPolarisation Plot 2')
xlabel('<-----ellipt xi ------>', 'rotation', 15)
ylabel('<----orient phi ------>', 'rotation', -20)
%figure
subplot(5,2,8)
mesh(xi_ellipt,phi_orient,G)
colorbar
axis([-45 45 0 180 0 1])
title('CrossPolarisation Plot 3')
xlabel('<-----ellipt xi ------>', 'rotation', 15)
ylabel('<----orient phi ------>', 'rotation', -20)
%figure
subplot(5,2,9)
mesh(xi_ellipt,phi_orient,H)
colorbar
axis([-45 45 0 180 0 1])
title('CrossPolarisation Plot 4')
xlabel('<-----ellipt xi ------>', 'rotation', 15)
ylabel('<----orient phi ------>', 'rotation', -20)
%figure
subplot(5,2,10)
mesh(xi_ellipt,phi_orient,I)
colorbar
axis([-45 45 0 180 0 1])
title('CrossPolarisation Plot 5')
xlabel('<-----ellipt xi ------>', 'rotation', 15)
ylabel('<----orient phi ------>', 'rotation', -20)
%figure
50
Fig 5.16. 3D Plot
Code for finding the Wavelet Coefficient (Level 2)
clear all
clc
close all
% The current extension mode is zero-padding (see dwtmode).
% Load original image.
% X contains the loaded image.
% Perform decomposition at level 2
% of X using db1.
path(path,'F:Trainee');
X=textread('urbanroi3_cross.txt');
[c,s] = wavedec2(X,2,'haar');
sizex = size(X)
sizec = size(c)
val_s = s
% Extract details coefficients at level 2
% in each orientation, from wavelet decomposition
% structure [c,s].
D = detcoef2('c',c,s,2)
51
Code for clustering data at Level 2
close all
clear all
clc
path(path,'F:Traineeurbanroidatahaar');
X=textread('totaln3.txt');
for i = 1:5
mi = min(X(:,i));
ma = max(X(:,i));
X(:,i) = (X(:,i)-mi)/(ma - mi);
end
msize = [4 4];
lattice = 'hexa'; % hexagonal lattice
neigh = 'gaussian'; % neighborhood function
radius_coarse = [4 .5]; % [initial final]
trainlen_coarse = 50; % cycles in coarse training
radius_fine = [.5 .5]; % [initial final]
trainlen_fine = 10; % cycles in fine training
smI = som_lininit(X, 'msize', msize, 'lattice', lattice, 'shape', ...
'sheet');
smC = som_batchtrain(smI, X, 'radius', radius_coarse, 'trainlen', ...
trainlen_coarse, 'neigh', neigh);
som_cplane('rect',msize, 'none')
sm = som_batchtrain(smC, X, 'radius', radius_fine, 'trainlen', ...
trainlen_fine, 'neigh', neigh);
labels1 = 'CCCCC';
labels2 = '12345';
M = sm.codebook;
norms2 = sum(M.*M,2);
op_mat=zeros(5,2);
t=1; % input serial no.
for u=1:6348
X1 = X(u,:)';
Y = norms2 - 2*M*X1;
[C,c] = min(Y);
op_mat(u,1)=t;
op_mat(u,2)=c;
ch = mod(c-1,4) + 1;
cv = floor((c-1)/4) + 1;
if mod(cv,2) == 1
shift1 = -.15;
else
shift1 = .35;
end
if u==9 || u==11
shift2 = -.3;
elseif u==5 || u==14
shift2 = .3;
52
else
shift2 = 0;
end
% text(ch+shift1,cv+shift2,[labels1(u) ...
% labels2(u)], 'FontSize',15);
text(ch,cv+shift2,[labels1(u) ...
labels2(u)], 'FontSize',15);
t=t+1;
end
op_mat
Fig 5.17. Clustering at Level 2
Conclusion: In the above figure, all are different classes (co and cross) derived from the
wavelet coefficients. Now we can see that C3 and C4 are near to each other hence having same
properties.
Code for finding the Wavelet Coefficient (Level 3)
clear all
clc
close all
% The current extension mode is zero-padding (see dwtmode).
% Load original image.
% X contains the loaded image.
% Perform decomposition at level 3
% of X using db1.
path(path,'F:Traineeurbanroidata');
X=textread('urbanroi4_co.txt');
[c,s] = wavedec2(X,3,'haar');
53
sizex = size(X)
sizec = size(c)
val_s = s
% Extract details coefficients at level 3
% in each orientation, from wavelet decomposition
% structure [c,s].
D = detcoef2('c',c,s,3)
Code for clustering data at Level 3
close all
clear all
clc
path(path,'F:Traineeurbanroidatahaar');
X=textread('totaln3.txt');
for i = 1:5
mi = min(X(:,i));
ma = max(X(:,i));
X(:,i) = (X(:,i)-mi)/(ma - mi);
end
msize = [4 4];
lattice = 'hexa'; % hexagonal lattice
neigh = 'gaussian'; % neighborhood function
radius_coarse = [4 .5]; % [initial final]
trainlen_coarse = 50; % cycles in coarse training
radius_fine = [.5 .5]; % [initial final]
trainlen_fine = 10; % cycles in fine training
smI = som_lininit(X, 'msize', msize, 'lattice', lattice, 'shape', ...
'sheet');
smC = som_batchtrain(smI, X, 'radius', radius_coarse, 'trainlen', ...
trainlen_coarse, 'neigh', neigh);
som_cplane('rect',msize, 'none')
sm = som_batchtrain(smC, X, 'radius', radius_fine, 'trainlen', ...
trainlen_fine, 'neigh', neigh);
labels1 = 'CCCCC';
labels2 = '12345';
M = sm.codebook;
norms2 = sum(M.*M,2);
op_mat=zeros(5,2);
t=1; % input serial no.
for u=1:1656
X1 = X(u,:)';
Y = norms2 - 2*M*X1;
[C,c] = min(Y);
op_mat(u,1)=t;
op_mat(u,2)=c;
ch = mod(c-1,4) + 1;
cv = floor((c-1)/4) + 1;
54
if mod(cv,2) == 1
shift1 = -.15;
else
shift1 = .35;
end
if u==9 || u==11
shift2 = -.3;
elseif u==5 || u==14
shift2 = .3;
else
shift2 = 0;
end
% text(ch+shift1,cv+shift2,[labels1(u) ...
% labels2(u)], 'FontSize',15);
text(ch,cv+shift2,[labels1(u) ...
labels2(u)], 'FontSize',15);
t=t+1;
end
op_mat
Fig 5.18. Clustering at Level 3
Conclusion: In the above figure, all are different classes (co and cross) derived from the
wavelet coefficients. Now we can see that C3 and C4 are near to each other hence having same
properties.
55
Code for finding the Wavelet Coefficient (Level 4)
clear all
clc
close all
% The current extension mode is zero-padding (see dwtmode).
% Load original image.
% X contains the loaded image.
% Perform decomposition at level 4
% of X using db1.
path(path,'F:Traineeurbanroidata');
X=textread('urbanroi3_cross.txt');
[c,s] = wavedec2(X,4,'haar');
sizex = size(X)
sizec = size(c)
val_s = s
% Extract details coefficients at level 4
% in each orientation, from wavelet decomposition
% structure [c,s].
D = detcoef2('c',c,s,4)
Code for clustering data at Level 4
close all
clear all
clc
path(path,'F:Traineeurbanroidatahaar');
X=textread('totaln4.txt');
for i = 1:5
mi = min(X(:,i));
ma = max(X(:,i));
X(:,i) = (X(:,i)-mi)/(ma - mi);
end
msize = [4 4];
lattice = 'hexa'; % hexagonal lattice
neigh = 'gaussian'; % neighborhood function
radius_coarse = [4 .5]; % [initial final]
trainlen_coarse = 50; % cycles in coarse training
radius_fine = [.5 .5]; % [initial final]
trainlen_fine = 10; % cycles in fine training
smI = som_lininit(X, 'msize', msize, 'lattice', lattice, 'shape', ...
'sheet');
smC = som_batchtrain(smI, X, 'radius', radius_coarse, 'trainlen', ...
trainlen_coarse, 'neigh', neigh);
som_cplane('rect',msize, 'none')
sm = som_batchtrain(smC, X, 'radius', radius_fine, 'trainlen', ...
trainlen_fine, 'neigh', neigh);
labels1 = 'CCCCC';
labels2 = '12345';
56
M = sm.codebook;
norms2 = sum(M.*M,2);
op_mat=zeros(5,2);
t=1; % input serial no.
for u=1:432
X1 = X(u,:)';
Y = norms2 - 2*M*X1;
[C,c] = min(Y);
op_mat(u,1)=t;
op_mat(u,2)=c;
ch = mod(c-1,4) + 1;
cv = floor((c-1)/4) + 1;
if mod(cv,2) == 1
shift1 = -.15;
else
shift1 = .35;
end
if u==9 || u==11
shift2 = -.3;
elseif u==5 || u==14
shift2 = .3;
else
shift2 = 0;
end
% text(ch+shift1,cv+shift2,[labels1(u) ...
% labels2(u)], 'FontSize',15);
text(ch,cv+shift2,[labels1(u) ...
labels2(u)], 'FontSize',15);
t=t+1;
end
op_mat
57
Fig 5.19. Clustering at Level 4
Conclusion: In the above figure, all are different classes (co and cross) derived from the
wavelet coefficients. Now we can see that C1 and C5 are near to each other hence having same
properties.
Code for finding the Wavelet Coefficient (Level 5)
clear all
clc
close all
% The current extension mode is zero-padding (see dwtmode).
% Load original image.
% X contains the loaded image.
% Perform decomposition at level 5
% of X using db1.
path(path,'F:Trainee');
X=textread('urbanroi5_co.txt');
[c,s] = wavedec2(X,5,'haar');
sizex = size(X)
sizec = size(c)
val_s = s
% Extract details coefficients at level 5
% in each orientation, from wavelet decomposition
% structure [c,s].
D = detcoef2('c',c,s,5)
58
Code for clustering data at Level 5
close all
clear all
clc
path(path,'F:Traineeurbanroidatahaar');
X=textread('totaln5.txt');
for i = 1:5
mi = min(X(:,i));
ma = max(X(:,i));
X(:,i) = (X(:,i)-mi)/(ma - mi);
end
msize = [4 4];
lattice = 'hexa'; % hexagonal lattice
neigh = 'gaussian'; % neighborhood function
radius_coarse = [4 .5]; % [initial final]
trainlen_coarse = 50; % cycles in coarse training
radius_fine = [.5 .5]; % [initial final]
trainlen_fine = 10; % cycles in fine training
smI = som_lininit(X, 'msize', msize, 'lattice', lattice, 'shape', ...
'sheet');
smC = som_batchtrain(smI, X, 'radius', radius_coarse, 'trainlen', ...
trainlen_coarse, 'neigh', neigh);
som_cplane('rect',msize, 'none')
sm = som_batchtrain(smC, X, 'radius', radius_fine, 'trainlen', ...
trainlen_fine, 'neigh', neigh);
labels1 = 'CCCCC';
labels2 = '12345';
M = sm.codebook;
norms2 = sum(M.*M,2);
op_mat=zeros(5,2);
t=1; % input serial no.
for u=1:108
X1 = X(u,:)';
Y = norms2 - 2*M*X1;
[C,c] = min(Y);
op_mat(u,1)=t;
op_mat(u,2)=c;
ch = mod(c-1,4) + 1;
cv = floor((c-1)/4) + 1;
if mod(cv,2) == 1
shift1 = -.15;
else
shift1 = .35;
end
if u==9 || u==11
shift2 = -.3;
elseif u==5 || u==14
shift2 = .3;
59
else
shift2 = 0;
end
% text(ch+shift1,cv+shift2,[labels1(u) ...
% labels2(u)], 'FontSize',15);
text(ch,cv+shift2,[labels1(u) ...
labels2(u)], 'FontSize',15);
t=t+1;
end
op_mat
Fig 5.20. Clustering at Level 5
Conclusion: In the above figure, all are different classes (co and cross) derived from the
wavelet coefficients. Now we can see that C1 and C4 are near to each other hence having same
properties.
Code for finding the Wavelet Coefficient (Level 6)
clear all
clc
close all
% The current extension mode is zero-padding (see dwtmode).
% Load original image.
% X contains the loaded image.
% Perform decomposition at level 6
% of X using db1.
path(path,'F:Trainee');
X=textread('urbanroi5_cross.txt');
[c,s] = wavedec2(X,6,'haar');
60
sizex = size(X)
sizec = size(c)
val_s = s
% Extract details coefficients at level 6
% in each orientation, from wavelet decomposition
% structure [c,s].
D = detcoef2('c',c,s,6)
Code for clustering data at Level 6
close all
clear all
clc
path(path,'F:Traineeurbanroidatahaar');
X=textread('totaln6.txt');
for i = 1:5
mi = min(X(:,i));
ma = max(X(:,i));
X(:,i) = (X(:,i)-mi)/(ma - mi);
end
msize = [4 4];
lattice = 'hexa'; % hexagonal lattice
neigh = 'gaussian'; % neighborhood function
radius_coarse = [4 .5]; % [initial final]
trainlen_coarse = 50; % cycles in coarse training
radius_fine = [.5 .5]; % [initial final]
trainlen_fine = 10; % cycles in fine training
smI = som_lininit(X, 'msize', msize, 'lattice', lattice, 'shape', ...
'sheet');
smC = som_batchtrain(smI, X, 'radius', radius_coarse, 'trainlen', ...
trainlen_coarse, 'neigh', neigh);
som_cplane('rect',msize, 'none')
sm = som_batchtrain(smC, X, 'radius', radius_fine, 'trainlen', ...
trainlen_fine, 'neigh', neigh);
labels1 = 'CCCCC';
labels2 = '12345';
M = sm.codebook;
norms2 = sum(M.*M,2);
op_mat=zeros(5,2);
t=1; % input serial no.
for u=1:36
X1 = X(u,:)';
Y = norms2 - 2*M*X1;
[C,c] = min(Y);
op_mat(u,1)=t;
op_mat(u,2)=c;
ch = mod(c-1,4) + 1;
cv = floor((c-1)/4) + 1;
if mod(cv,2) == 1
61
shift1 = -.15;
else
shift1 = .35;
end
if u==9 || u==11
shift2 = -.3;
elseif u==5 || u==14
shift2 = .3;
else
shift2 = 0;
end
% text(ch+shift1,cv+shift2,[labels1(u) ...
% labels2(u)], 'FontSize',15);
text(ch,cv+shift2,[labels1(u) ...
labels2(u)], 'FontSize',15);
t=t+1;
end
op_mat
Fig 5.21. Clustering at Level 6
Conclusion: In the above figure, all are different classes (co and cross) derived from the
wavelet coefficients. Now we can see that C3 and C5 are near to each other hence having same
properties.
62
6. Additional Feature of SOM
A graphic display based on these distances, called the U matrix, is explained in this section.
The clustering tendency of the data, or the models to describe them, can be shown graphically
based on the magnitudes of vectorial distances between neighboring model s in the map, as
shown in Fig. 6.1. below. In it, the number of hexagonal cells has first been increased from 18
by 17 to 35 by 33, in order to create blank interstitial cells that can be colored by shades of
gray or by pseudocolors to emphasize the cluster borders. This creation of the interstitial cells
is made automatically, when the instructions defined below are executed.
The U matrix. A graphic display called theU matrix has been developed by Ultsch [87], as
well as Kraaijveld et al. [47], to illustrate the degree of clustering tendency on the SOM. In the
basic method, interstitial (e.g. hexagonal in the present example) cells are added between the
original SOM cells in the display. So, if we have an SOM array of 18 by 17 cells, after addition
of the new cells the array size becomes 35 by 33. Notice, however, that the extra cells are not
involved in the SOM algorithm, only the original 18 by 17 cells were trained. The average
(smoothed) distances between the nearest SOM models are represented by light colors for small
mean differences, and darker colors for larger mean differences. A ”cluster landscape,” formed
over the SOM, then visualizes the degree of classification. The groundwork for the U matrix is
generated by the instructions
colormapigray = ones(64,3) - colormap(’gray’);
colormap(colormapigray);
msize = [18 17];
Um = som_umat(sm);
som_cplane(’hexaU’, sm.topol.msize, Um(:));
In this case, we need not draw the blank SOM groundwork by the instruction som
cplane(’hexa’,msize, ’none’) . The example with the countries in August 2014 will now be
represented together with the U-matrix ”landscape” in Fig. 6.1.. It shows darker “ravines”
between the clusters. Annotation of the SOM nodes. After that, the country acronyms are
written on it by the text instruction. The complete map is shown in Fig. 6.1.
63
Fig. 6.1. The SOM, with the U matrix, of the financial status of 50 countries
A particular remark is due here. Notice that there are plenty of unlabelled areas in the SOM.
During training, when the model vectors had not yet reached their final values, the ”winner
nodes” for certain countries were located in completely different places than finally;
nonetheless the models at these nodes gathered memory traces during training, too. Thus, these
nodes have learned more or less wrong and even random values during the coarse training,
with the result that the vectorial differences of the models in those places are large. So the SOM
with unique items mapped on it and having plenty of blank space between them is not
particularly suitable for the demonstration of the U matrix, although some interesting details
can be found in it.
64
7. Future Enhancements
Artificial neural Networks is currently a hot research area in Data compression. GSOM
algorithm for data compression being a wide field which is rapidly finding use in many applied
fields and technologies GSOM has some limitation. They cannot compress higher size of audio
and video file. So To Improve the Compression Ratio of higher size of audio and video file in
future enhancement.
8. Bibliography and References
1. IJCSNS International Journal of Computer Science and Network Security, VOL.14
No.11, November 2014
2. https://en.wikipedia.org/wiki/Haar_wavelet
3. https://en.wikipedia.org/wiki/Wavelet
4. https://en.wikipedia.org/wiki/Cross-polarized_wave_generation
5. http://www.edmundoptics.com/resources/application-notes/imaging/what-is-swir/
6. https://en.wikipedia.org/wiki/Near-infrared_spectroscopy
7. https://en.wikipedia.org/wiki/Additive_white_Gaussian_noise
8. https://en.wikipedia.org/wiki/Visible_spectrum
9. https://en.wikipedia.org/wiki/Visible_spectrum#/media/File:Linear_visible_spectrum.
svg
10. Kohonen, T., MATLAB Implementations and Applications of the Self-Organizing
Map, Unigrafia Oy, Helsinki, Finland, 2014.
11. Introduction to Kohanen’s Self Organising Maps, Fernando Bacao and Victor Lobo

More Related Content

Viewers also liked

Bioinformatics Project Training for 2,4,6 month
Bioinformatics Project Training for 2,4,6 monthBioinformatics Project Training for 2,4,6 month
Bioinformatics Project Training for 2,4,6 monthbiinoida
 
IEEE- Intrusion Detection Model using Self Organizing Map
IEEE- Intrusion Detection Model using Self Organizing MapIEEE- Intrusion Detection Model using Self Organizing Map
IEEE- Intrusion Detection Model using Self Organizing MapTushar Shinde
 
Timo Honkela: Kohonen's Self-Organizing Maps for Intelligent Systems Developm...
Timo Honkela: Kohonen's Self-Organizing Maps for Intelligent Systems Developm...Timo Honkela: Kohonen's Self-Organizing Maps for Intelligent Systems Developm...
Timo Honkela: Kohonen's Self-Organizing Maps for Intelligent Systems Developm...Timo Honkela
 
Timo Honkela: Turning quantity into quality and making concepts visible using...
Timo Honkela: Turning quantity into quality and making concepts visible using...Timo Honkela: Turning quantity into quality and making concepts visible using...
Timo Honkela: Turning quantity into quality and making concepts visible using...Timo Honkela
 
B.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionB.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionRai University
 
Timo Honkela: Self-Organizing Map as a Means for Gaining Perspectives
Timo Honkela: Self-Organizing Map as a Means for Gaining PerspectivesTimo Honkela: Self-Organizing Map as a Means for Gaining Perspectives
Timo Honkela: Self-Organizing Map as a Means for Gaining PerspectivesTimo Honkela
 
"k-means-clustering" presentation @ Papers We Love Bucharest
"k-means-clustering" presentation @ Papers We Love Bucharest"k-means-clustering" presentation @ Papers We Love Bucharest
"k-means-clustering" presentation @ Papers We Love BucharestAdrian Florea
 
gene regulation sdk 2013
gene regulation sdk 2013gene regulation sdk 2013
gene regulation sdk 2013Dr-HAMDAN
 
Neural networks Self Organizing Map by Engr. Edgar Carrillo II
Neural networks Self Organizing Map by Engr. Edgar Carrillo IINeural networks Self Organizing Map by Engr. Edgar Carrillo II
Neural networks Self Organizing Map by Engr. Edgar Carrillo IIEdgar Carrillo
 
Neural Networks: Self-Organizing Maps (SOM)
Neural Networks:  Self-Organizing Maps (SOM)Neural Networks:  Self-Organizing Maps (SOM)
Neural Networks: Self-Organizing Maps (SOM)Mostafa G. M. Mostafa
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Vijay Hemmadi
 
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsMethods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsRyan B Harvey, CSDP, CSM
 

Viewers also liked (20)

Self Organizing Maps
Self Organizing MapsSelf Organizing Maps
Self Organizing Maps
 
Bioinformatics Project Training for 2,4,6 month
Bioinformatics Project Training for 2,4,6 monthBioinformatics Project Training for 2,4,6 month
Bioinformatics Project Training for 2,4,6 month
 
Self-organizing map
Self-organizing mapSelf-organizing map
Self-organizing map
 
IEEE- Intrusion Detection Model using Self Organizing Map
IEEE- Intrusion Detection Model using Self Organizing MapIEEE- Intrusion Detection Model using Self Organizing Map
IEEE- Intrusion Detection Model using Self Organizing Map
 
WCNC
WCNCWCNC
WCNC
 
Timo Honkela: Kohonen's Self-Organizing Maps for Intelligent Systems Developm...
Timo Honkela: Kohonen's Self-Organizing Maps for Intelligent Systems Developm...Timo Honkela: Kohonen's Self-Organizing Maps for Intelligent Systems Developm...
Timo Honkela: Kohonen's Self-Organizing Maps for Intelligent Systems Developm...
 
Timo Honkela: Turning quantity into quality and making concepts visible using...
Timo Honkela: Turning quantity into quality and making concepts visible using...Timo Honkela: Turning quantity into quality and making concepts visible using...
Timo Honkela: Turning quantity into quality and making concepts visible using...
 
Analyzing and integrating probabilistic and deterministic computational model...
Analyzing and integrating probabilistic and deterministic computational model...Analyzing and integrating probabilistic and deterministic computational model...
Analyzing and integrating probabilistic and deterministic computational model...
 
B.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene predictionB.sc biochem i bobi u 4 gene prediction
B.sc biochem i bobi u 4 gene prediction
 
IntelliGO semantic similarity measure for Gene Ontology annotations
IntelliGO semantic similarity measure for Gene Ontology annotationsIntelliGO semantic similarity measure for Gene Ontology annotations
IntelliGO semantic similarity measure for Gene Ontology annotations
 
Timo Honkela: Self-Organizing Map as a Means for Gaining Perspectives
Timo Honkela: Self-Organizing Map as a Means for Gaining PerspectivesTimo Honkela: Self-Organizing Map as a Means for Gaining Perspectives
Timo Honkela: Self-Organizing Map as a Means for Gaining Perspectives
 
"k-means-clustering" presentation @ Papers We Love Bucharest
"k-means-clustering" presentation @ Papers We Love Bucharest"k-means-clustering" presentation @ Papers We Love Bucharest
"k-means-clustering" presentation @ Papers We Love Bucharest
 
SCoT and RAPD
SCoT and RAPDSCoT and RAPD
SCoT and RAPD
 
Bioalgo 2012-01-gene-prediction-sim
Bioalgo 2012-01-gene-prediction-simBioalgo 2012-01-gene-prediction-sim
Bioalgo 2012-01-gene-prediction-sim
 
Gene aswa (2)
Gene aswa (2)Gene aswa (2)
Gene aswa (2)
 
gene regulation sdk 2013
gene regulation sdk 2013gene regulation sdk 2013
gene regulation sdk 2013
 
Neural networks Self Organizing Map by Engr. Edgar Carrillo II
Neural networks Self Organizing Map by Engr. Edgar Carrillo IINeural networks Self Organizing Map by Engr. Edgar Carrillo II
Neural networks Self Organizing Map by Engr. Edgar Carrillo II
 
Neural Networks: Self-Organizing Maps (SOM)
Neural Networks:  Self-Organizing Maps (SOM)Neural Networks:  Self-Organizing Maps (SOM)
Neural Networks: Self-Organizing Maps (SOM)
 
Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins Secondary Structure Prediction of proteins
Secondary Structure Prediction of proteins
 
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data SetsMethods of Manifold Learning for Dimension Reduction of Large Data Sets
Methods of Manifold Learning for Dimension Reduction of Large Data Sets
 

Similar to SOM Implementation for Clustering Remote Sensing Data

RDO_01_2016_Journal_P_Web
RDO_01_2016_Journal_P_WebRDO_01_2016_Journal_P_Web
RDO_01_2016_Journal_P_WebSahl Martin
 
Coates p: the use of genetic programing in exploring 3 d design worlds
Coates p: the use of genetic programing in exploring 3 d design worldsCoates p: the use of genetic programing in exploring 3 d design worlds
Coates p: the use of genetic programing in exploring 3 d design worldsArchiLab 7
 
A Comparison of Traditional Simulation and MSAL (6-3-2015)
A Comparison of Traditional Simulation and MSAL (6-3-2015)A Comparison of Traditional Simulation and MSAL (6-3-2015)
A Comparison of Traditional Simulation and MSAL (6-3-2015)Bob Garrett
 
Rauber, a. 1999: label_som_on the labeling of self-organising maps
Rauber, a. 1999:  label_som_on the labeling of self-organising mapsRauber, a. 1999:  label_som_on the labeling of self-organising maps
Rauber, a. 1999: label_som_on the labeling of self-organising mapsArchiLab 7
 
IEEE Pattern analysis and machine intelligence 2016 Title and Abstract
IEEE Pattern analysis and machine intelligence 2016 Title and AbstractIEEE Pattern analysis and machine intelligence 2016 Title and Abstract
IEEE Pattern analysis and machine intelligence 2016 Title and Abstracttsysglobalsolutions
 
Face Recognition System using Self Organizing Feature Map and Appearance Base...
Face Recognition System using Self Organizing Feature Map and Appearance Base...Face Recognition System using Self Organizing Feature Map and Appearance Base...
Face Recognition System using Self Organizing Feature Map and Appearance Base...ijtsrd
 
Instruction level parallelism using ppm branch prediction
Instruction level parallelism using ppm branch predictionInstruction level parallelism using ppm branch prediction
Instruction level parallelism using ppm branch predictionIAEME Publication
 
developments and applications of the self-organizing map and related algorithms
developments and applications of the self-organizing map and related algorithmsdevelopments and applications of the self-organizing map and related algorithms
developments and applications of the self-organizing map and related algorithmsAmir Shokri
 
mapReduce for machine learning
mapReduce for machine learning mapReduce for machine learning
mapReduce for machine learning Pranya Prabhakar
 
Fractal analysis of good programming style
Fractal analysis of good programming styleFractal analysis of good programming style
Fractal analysis of good programming stylecsandit
 
FRACTAL ANALYSIS OF GOOD PROGRAMMING STYLE
FRACTAL ANALYSIS OF GOOD PROGRAMMING STYLEFRACTAL ANALYSIS OF GOOD PROGRAMMING STYLE
FRACTAL ANALYSIS OF GOOD PROGRAMMING STYLEcscpconf
 
Bengkel smartPLS 2011
Bengkel smartPLS 2011Bengkel smartPLS 2011
Bengkel smartPLS 2011Adi Ali
 
Probabilistic Programming for Dynamic Data Assimilation on an Agent-Based Model
Probabilistic Programming for Dynamic Data Assimilation on an Agent-Based ModelProbabilistic Programming for Dynamic Data Assimilation on an Agent-Based Model
Probabilistic Programming for Dynamic Data Assimilation on an Agent-Based ModelNick Malleson
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONijaia
 
A Framework for Visualizing Association Mining Results
A Framework for Visualizing  Association Mining ResultsA Framework for Visualizing  Association Mining Results
A Framework for Visualizing Association Mining Resultsertekg
 
A novel hybrid deep learning model for price prediction
A novel hybrid deep learning model for price prediction A novel hybrid deep learning model for price prediction
A novel hybrid deep learning model for price prediction IJECEIAES
 
Scene Description From Images To Sentences
Scene Description From Images To SentencesScene Description From Images To Sentences
Scene Description From Images To SentencesIRJET Journal
 
A formal conceptual framework
A formal conceptual frameworkA formal conceptual framework
A formal conceptual frameworkijseajournal
 

Similar to SOM Implementation for Clustering Remote Sensing Data (20)

RDO_01_2016_Journal_P_Web
RDO_01_2016_Journal_P_WebRDO_01_2016_Journal_P_Web
RDO_01_2016_Journal_P_Web
 
Coates p: the use of genetic programing in exploring 3 d design worlds
Coates p: the use of genetic programing in exploring 3 d design worldsCoates p: the use of genetic programing in exploring 3 d design worlds
Coates p: the use of genetic programing in exploring 3 d design worlds
 
A Comparison of Traditional Simulation and MSAL (6-3-2015)
A Comparison of Traditional Simulation and MSAL (6-3-2015)A Comparison of Traditional Simulation and MSAL (6-3-2015)
A Comparison of Traditional Simulation and MSAL (6-3-2015)
 
Rauber, a. 1999: label_som_on the labeling of self-organising maps
Rauber, a. 1999:  label_som_on the labeling of self-organising mapsRauber, a. 1999:  label_som_on the labeling of self-organising maps
Rauber, a. 1999: label_som_on the labeling of self-organising maps
 
IEEE Pattern analysis and machine intelligence 2016 Title and Abstract
IEEE Pattern analysis and machine intelligence 2016 Title and AbstractIEEE Pattern analysis and machine intelligence 2016 Title and Abstract
IEEE Pattern analysis and machine intelligence 2016 Title and Abstract
 
Face Recognition System using Self Organizing Feature Map and Appearance Base...
Face Recognition System using Self Organizing Feature Map and Appearance Base...Face Recognition System using Self Organizing Feature Map and Appearance Base...
Face Recognition System using Self Organizing Feature Map and Appearance Base...
 
Instruction level parallelism using ppm branch prediction
Instruction level parallelism using ppm branch predictionInstruction level parallelism using ppm branch prediction
Instruction level parallelism using ppm branch prediction
 
developments and applications of the self-organizing map and related algorithms
developments and applications of the self-organizing map and related algorithmsdevelopments and applications of the self-organizing map and related algorithms
developments and applications of the self-organizing map and related algorithms
 
mapReduce for machine learning
mapReduce for machine learning mapReduce for machine learning
mapReduce for machine learning
 
Fractal analysis of good programming style
Fractal analysis of good programming styleFractal analysis of good programming style
Fractal analysis of good programming style
 
FRACTAL ANALYSIS OF GOOD PROGRAMMING STYLE
FRACTAL ANALYSIS OF GOOD PROGRAMMING STYLEFRACTAL ANALYSIS OF GOOD PROGRAMMING STYLE
FRACTAL ANALYSIS OF GOOD PROGRAMMING STYLE
 
Bengkel smartPLS 2011
Bengkel smartPLS 2011Bengkel smartPLS 2011
Bengkel smartPLS 2011
 
Probabilistic Programming for Dynamic Data Assimilation on an Agent-Based Model
Probabilistic Programming for Dynamic Data Assimilation on an Agent-Based ModelProbabilistic Programming for Dynamic Data Assimilation on an Agent-Based Model
Probabilistic Programming for Dynamic Data Assimilation on an Agent-Based Model
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
 
A Framework for Visualizing Association Mining Results
A Framework for Visualizing  Association Mining ResultsA Framework for Visualizing  Association Mining Results
A Framework for Visualizing Association Mining Results
 
A novel hybrid deep learning model for price prediction
A novel hybrid deep learning model for price prediction A novel hybrid deep learning model for price prediction
A novel hybrid deep learning model for price prediction
 
som
somsom
som
 
Pca seminar final report
Pca seminar final reportPca seminar final report
Pca seminar final report
 
Scene Description From Images To Sentences
Scene Description From Images To SentencesScene Description From Images To Sentences
Scene Description From Images To Sentences
 
A formal conceptual framework
A formal conceptual frameworkA formal conceptual framework
A formal conceptual framework
 

More from Daksh Raj Chopra

Prove/disprove of microphone used for targeting Ads
Prove/disprove of microphone used for targeting Ads Prove/disprove of microphone used for targeting Ads
Prove/disprove of microphone used for targeting Ads Daksh Raj Chopra
 
Foundations for New Champlain Bridge Corridor Project
Foundations for New Champlain Bridge Corridor ProjectFoundations for New Champlain Bridge Corridor Project
Foundations for New Champlain Bridge Corridor ProjectDaksh Raj Chopra
 
Foundations for New Champlain Bridge Corridor Project
Foundations for New Champlain Bridge Corridor ProjectFoundations for New Champlain Bridge Corridor Project
Foundations for New Champlain Bridge Corridor ProjectDaksh Raj Chopra
 
Internet of things (IoT) Architecture Security Analysis
Internet of things (IoT) Architecture Security AnalysisInternet of things (IoT) Architecture Security Analysis
Internet of things (IoT) Architecture Security AnalysisDaksh Raj Chopra
 
8 bit Multiplier Accumulator
8 bit Multiplier Accumulator8 bit Multiplier Accumulator
8 bit Multiplier AccumulatorDaksh Raj Chopra
 
Simulation of a Wireless Sub Network using QualNET
Simulation of a Wireless Sub Network using QualNETSimulation of a Wireless Sub Network using QualNET
Simulation of a Wireless Sub Network using QualNETDaksh Raj Chopra
 
DTMF based Home Automation System
DTMF based Home Automation SystemDTMF based Home Automation System
DTMF based Home Automation SystemDaksh Raj Chopra
 
Advance Microcontroller AVR
Advance Microcontroller AVRAdvance Microcontroller AVR
Advance Microcontroller AVRDaksh Raj Chopra
 
DTMF based Home Applicance System
DTMF based Home Applicance SystemDTMF based Home Applicance System
DTMF based Home Applicance SystemDaksh Raj Chopra
 

More from Daksh Raj Chopra (11)

Prove/disprove of microphone used for targeting Ads
Prove/disprove of microphone used for targeting Ads Prove/disprove of microphone used for targeting Ads
Prove/disprove of microphone used for targeting Ads
 
Foundations for New Champlain Bridge Corridor Project
Foundations for New Champlain Bridge Corridor ProjectFoundations for New Champlain Bridge Corridor Project
Foundations for New Champlain Bridge Corridor Project
 
Foundations for New Champlain Bridge Corridor Project
Foundations for New Champlain Bridge Corridor ProjectFoundations for New Champlain Bridge Corridor Project
Foundations for New Champlain Bridge Corridor Project
 
Maggi noodles Case Study
Maggi noodles Case StudyMaggi noodles Case Study
Maggi noodles Case Study
 
Internet of things (IoT) Architecture Security Analysis
Internet of things (IoT) Architecture Security AnalysisInternet of things (IoT) Architecture Security Analysis
Internet of things (IoT) Architecture Security Analysis
 
8 bit Multiplier Accumulator
8 bit Multiplier Accumulator8 bit Multiplier Accumulator
8 bit Multiplier Accumulator
 
Simulation of a Wireless Sub Network using QualNET
Simulation of a Wireless Sub Network using QualNETSimulation of a Wireless Sub Network using QualNET
Simulation of a Wireless Sub Network using QualNET
 
Safety guard for blind
Safety guard for blindSafety guard for blind
Safety guard for blind
 
DTMF based Home Automation System
DTMF based Home Automation SystemDTMF based Home Automation System
DTMF based Home Automation System
 
Advance Microcontroller AVR
Advance Microcontroller AVRAdvance Microcontroller AVR
Advance Microcontroller AVR
 
DTMF based Home Applicance System
DTMF based Home Applicance SystemDTMF based Home Applicance System
DTMF based Home Applicance System
 

Recently uploaded

Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxabhijeetpadhi001
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 

Recently uploaded (20)

Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
MICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptxMICROBIOLOGY biochemical test detailed.pptx
MICROBIOLOGY biochemical test detailed.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 

SOM Implementation for Clustering Remote Sensing Data

  • 1. 1 TRAINING REPORT OF SIX MONTHS INDUSTRIAL TRAINING, UNDERTAKEN AT DEFENCE RESEARCH AND DEVELOPMENT ORGANIZATION IN DEFENCE TERRAIN RESEARCH LABORATORY ON MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSING DATA SUBMITTED IN PARTIAL FULFILLMENT OF THE DEGREE OF B.E. (ECE) Under the Guidance of: Submitted By: Name: T S RAWAT Name: DAKSH RAJ CHOPRA Designation: SCIENTIST ‘E’ University ID No.: B120020081 Department: DEFENCE TERRAIN RESEARCH LABORATORY Chitkara University, Himachal Pradesh, India.
  • 2. 2 ACKNOWLEDGEMENT I take this opportunity to express my profound gratitude and deep regards to my guide Sc. T S Rawat for his exemplary guidance, monitoring and constant encouragement throughout the course of this thesis. The blessing, help and guidance given by him time to time shall carry me a long way in the journey of life on which I am about to embark. I also take this opportunity to express a deep sense of gratitude to Sc. Sujata Das, DTRL for her cordial support, valuable information and guidance, which helped me in completing this task through various stages. I am obliged to staff members of DTRL for the valuable information provided by them in their respective fields. I am grateful for their cooperation during the period of my assignment. Lastly, I thank almighty, my parents, brother, sisters and friends for their constant encouragement without which this assignment would not be possible.
  • 3. 3 PREFACE There is a famous saying “The theory without practical is lame and practical without theory is blind.” This project is the result of one semester training at Defence Terrain Research Laboratory. This internship is an integral part of “Engineering” course and it aims at providing a first-hand experience of industry to students. The following practical experience helps the students to view the industrial work and knowledge closely. I was really fortunate of getting an opportunity to pursue my Summer Training in reputed, well established, fast growing and professionally managed organization like Defence Terrain Research Laboratory assigned “MATLAB IMPLEMENTATION OF SELF-ORGANIZING MAPS FOR CLUSTERING OF REMOTE SENSING DATA.”
  • 4. 4 CONTENTS Sr. No. Topic Page No. 1. Project Undertaken 5 2. Introduction 6 3. Requirement Analysis with Illustrations 10 4. SOM Algorithm 16 5. Experiments during training 20 6. Additional feature of SOM 62 7. Future Enhancement 64 8. Bibliography and References 64
  • 5. 5 1. Project Undertaken My internship was based on a single objective of Self Organizing Maps (SOM). Now first thing that comes in our mind is that why this topic. Why do we need SOM and what does it do. All that will be told in this report. There existed a projection known as Sammon Projection which showed image of data items having same attributes closer to each other and different one having more distance. But it could not convert high dimensional to low dimensional image. For that we needed SOM which work on machine learning process from the neurons (attributes) provided at the input. The Self-Organizing Map represents a set of high-dimensional data items as a quantized two dimensional image in an orderly fashion. Every data item is mapped into one point (node) in the map, and the distances of the items in the map reflect similarities between the items. SOM used a data mining technique, that it works on the principle of pattern recognize. It uses the idea of adaptive networks that started the Artificial Intelligence research. This further helps in the data analysis. For example while studying 17 different elements, we used to differentiate of the basis of physical properties but SOM learned the values and made a pattern. After that it did a clustering of that 17 elements. The Self-Organizing Map (SOM) is a data-analysis method that visualizes similarity relations in a set of data items. For instance in economy, it has been applied to the comparison of enterprises at different levels of abstraction, to assess their relative financial conditions, and to profile their products and customers. Next comes the industrial applications of this project. In industry, the monitoring of processes, systems and machineries by the SOM method has been a very important application, and there the purpose is to describe the masses of different input states by ordered clusters of typical states. In science and technology at large, there exist unlimited tasks where the research objects must be classified on the basis of their inherent properties, to mention the classification of proteins, genetic sequences and galaxies. Furthermore applications of SOM are:- 1. Statistical methods at large (a) Exploratory data analysis (b) Statistical analysis and organization of texts 2. Industrial analyses, control, and telecommunications 3. Biomedical analyses and applications at large 4. Financial applications
  • 6. 6 2. Introduction 2.1 SOM Representation Only in some special cases the relation of input items with their projection images is one-to-one in the SOM. More often, especially in industrial and scientific applications, the mapping is many-to-one: i.e., the projection images on the SOM are local averages of the input-data distribution, comparable to the k-means averages in classical vector quantization. In the VQ, the local averages are represented by a finite set of codebook vectors. The SOM also uses a finite set of “codebook vectors”, called the models, for the representation of local averages. An input vector is mapped into a particular node on the SOM array by comparing it with all of the models, and the best-matching model, called the winner, is identified, like in VQ. The most essential difference with respect to the k-means clustering, however, is that the models of the SOM also reflect topographic relations between the projection images which are similar to those of the source data. So the SOM is actually a data compression method, which represents the topographic relations of the data space by a finite set of models on a topographic map, in an orderly fashion. In the standard-of-living diagram shown in Fig. 2.1, unique input items are mapped on unique locations on the SOM. However, in the majority of applications, there are usually many statistically distributed variations of the input items, and the projection image that is formed on the SOM then represents clusters of the variations. We shall make this fact more clearly in examples. So, in its genuine form the SOM differs from all of the other nonlinearly projecting methods, because it usually represents a big data set by a much smaller number of models, sometimes also called ”weight vectors” (this latter term comes from the theory of artificial neural networks), arranged as a rectangular array of nodes. Each model has the same number of parameters as the number of features in the input items. However, an SOM model may not be a replica of any input item but only a local average over a subset of items that are most similar to it. In this sense the SOM works like the k-means clustering algorithm, but in addition, in a special learning process, the SOM also arranges the k means into a topographic order according to their similarity relations. The parameters of the models are variable and they are adjusted by learning such that, in relation to the original items, the similarity relations of the models finally approximate or represent the similarity relations of the original items. It is obvious that an insightful view of the complete data base can then be obtained at one glance.
  • 7. 7 Fig. 2.1. Structured diagram of the data set chosen to describe the standard of living in 126 countries of the world in the year 1992. The abbreviated country symbols are concentrated onto locations in the (quantized) display computed by the SOM algorithm. The symbols written in capital letters correspond to those 78 countries for which at least 28 indicators out of 39 were given, and they were used in the real computation of the SOM. The symbols written in low case letters correspond to countries for which more than 11 indicator values were missing, and these countries are projected to locations based on the incomplete comparison of their given attributes with those of the 78 countries. 2.2 SOM Display It shall be emphasized that unlike in the other projective methods, in the SOM the representations of the items are not moved anywhere in their “topographic” map for their ordering. Instead, the adjustable parameters of the models are associated with fixed locations of the map once and for all, namely, with the nodes of a regular, usually two-dimensional array (Fig. 2). A hexagonal array, like the pixels on a TV screen, provides the best visualization. Initially the parameters of the models can even have random values. The correct final values of the models or “weight vectors” will develop gradually by learning. The representations, i.e., the models, become more or less exact replica of the input items when their sets of feature parameters are tuned towards the input items during learning. The SOM algorithm constructs the models such that: After learning, more similar models will be associated with nodes that are closer in the array, whereas less similar models will be situated gradually farther away in the array. It may be easier to understand the rather involved learning principles and mathematics of the SOM, if the central idea is first expressed in the following simple illustrative form. Let X denote a general input item, which is broadcast to all nodes for its concurrent comparison with all of the models. Every input data item shall select the model that matches best with the input item, and this model, called the winner, as well as a subset of its spatial neighbors in the array, shall be modified for better matching. Like in the k-means clustering, the modification is concentrated on a selected node that contains the winner model. On the other hand, since a whole spatial neighbourhood around the winner in the array is modified at a time, the degree of local, differential ordering of the models in this neighbourhood, due to a smoothing action,
  • 8. 8 will be increased. The successive, different inputs cause corrections in different subsets of models. The local ordering actions will gradually be propagated over the array. 2.3 SOM – A Neural Model Many principles in computer science have started as models of neural networks. The first computers were nicknamed “giant brains,” and the electronic logic circuits used in the first computers, as contrasted with the earlier electromechanical relay-logic (switching) networks, were essentially nothing but networks of threshold triggers, believed to imitate the alleged operation of the neural cells. The first useful neural-network models were adaptive threshold-logic circuits, in which the signals were weighted by adaptive (”learning”) coefficients. A significant new aspect introduced in the 1960s was to consider collective effects in distributed adaptive networks, which materialized in new distributed associative memory models, multilayer signal- transforming and pattern-classifying circuits, and networks with massive feedbacks and stable eigenstates, which solved certain optimization problems. Against this background, the Self-Organizing Map (SOM) introduced around 1981-82 may be seen as a model of certain cognitive functions, namely, a network model that is able to create organized representations of sensory experiences, like the brain maps in the cerebral cortex and other parts of the central nervous system do. In the first place the SOM gave some hints of how the brain maps could be formed postnatally, without any genetic control. The first demonstrations of the SOM exemplified the adaptive formation of sensory maps in the brain, and stipulated what functional properties are most essential to their formation. In the early 1970s there were big steps made in pattern recognition (PR) techniques. They continued the idea of adaptive networks that started the “artificial intelligence (AI)” research. However, after the introduction of large time-shared computer systems, a lot of computer scientists took a new course in the AI research, developing complex decision rules by ”heuristic programming”, by which it became possible to implement, e.g., expert systems, computerized control of large projects, etc. However, these rules were mainly designed manually. Nonetheless there was a group of computer scientists who were not happy with this approach: they wanted to continue the original ideas, and to develop computational methods for new analytical tasks in information science, such as remote sensing, image analysis in medicine, and speech recognition. This kind of Pattern Recognition was based on mathematical statistics, and with the advent of new powerful computers, it too could be applied to large and important problems. Notwithstanding the connection between AI and PR research broke in the 1970ies, and the AI and PR conferences and societies started to operate separately. Although the Self-Organizing Map research was started in the neural-networks context, its applications were actually developed in experimental pattern-recognition research, which was using real data. It was first found promising in speech recognition, but very soon numerous other applications were found in industry, finance, and science. The SOM is nowadays regarded as a general data-analysis method in a number of fields. Data mining has a special flavor in data analysis. When a new research topic is started, one usually has little understanding of the collected data. With time it happens that new, unexpected results or phenomena will be found. The meaning often given to automated data mining is that
  • 9. 9 the method is able to discover new, unexpected and surprising results. Even in this book I have tried to collect simple experiments, in which something quite unexpected will show up. Consider, for instance, Sec. 12 (”The SOM of some metallic elements”) in which we find that the ferromagnetic metals are mapped to a tight cluster; this result was not expected, but the data analysis suggested that the nonmagnetic properties of the metals must have a very strong correlation with the magnetic ones! Or in Sec. 19 (”Two-class separation of mushrooms on the basis of visible attributes”) we are clustering the mushrooms on the basis of their visible attributes only, but this clustering results in a dichotomy of edible vs. poisonous species. In Sec. 13 we are organizing color vectors by the SOM, and if we use a special scaling of the color components, we obtain a color map that coincides with the chromaticity map of human color vision, although this result was in no way expected. Accordingly, it may be safe to say that the SOM is a genuine data mining method, and it will find its most interesting applications in new areas of science and technology. I may still mention a couple of other works from real life. In 1997, Naim et al. published a work [61] that clustered middle-distance galaxies according to their morphology, and found a classification that they characterized as a new finding, compared with the old standard classification performed by Hubble. In Finland, the pulp mill industries were believed to represent the state of the art, but a group of our laboratory was conducting very careful studies of the process by the SOM method [1], and the process experts payed attention to certain instabilities in the flow of pulp through a continuous digester. This instability was not known before, and there was sufficient reason to change the internal structures of the digester so that the instability disappeared. It seems possible that the SOM will open new vistas into the information science. Not only does it already have numerous spin-offs in applications, but its role in the theory of cognition is intriguing. However, its mathematics is still in its infancy and offers new problems especially for mathematical statistics. A lot of high-level research is going on in this area. Maybe it is not exaggerated to assert that the SOM presents a new information processing paradigm, or at least a philosophical line in bioinformatics. To recapitulate, the SOM is a clustering method, but unlike the usual clustering methods, it is also a topography-preserving nonlinear projection mapping. On the other hand, while the other nonlinear projection mappings are also topography-preserving mappings, they do not average data, like the SOM does.
  • 10. 10 3. Requirement Analysis with Illustrations 3.1 Basic Illustration Now in this training our job was to do the clustering of data using Self Organising Map tool. So for that we used the MATLAB software in which SOM ToolBox was used. This tool has various in-build functions for the clustering. But to use a tool we need to first understand the tool and then give the required input. Let me give an example, a small code, to give you some idea of how this tool helps in clustering of data in MATLAB. In this example we have a data of 17 elements having 12 different properties and this whole data was given to the SOM Tool. close all clear all clc X = [2.7 23.8 1.34 5105 .7 187 .22 658 92.4 2450 2800 2.72 6.68 10.8 2.7 3400 .79 16 .051 630.5 38.9 1380 300 39.8 10.5 18.8 .99 2700 .8 357 .056 960.5 25 1930 520 1.58 22.42 6.59 .27 4900 5.2 51.5 .032 2454 28 4800 930 5.3 8.64 31.6 2.25 2300 .51 84 .08 320.5 13 767 240 7.25 8.8 12.6 .54 4720 2.08 60 .1 1489 67 3000 1550 6.8 19.3 14.3 .58 2080 .81 268 .031 1063 15.7 2710 420 2.21 8.92 16.2 .73 3900 1.2 338 .0928 1083 50 2360 1110 1.72 11.34 28.9 2.37 1320 .17 30 .03 327 5.9 1750 220 20.7 1.734 25.5 2.95 4600 .45 145 .25 650 46.5 1097 1350 4.3 8.9 12.9 .53 4970 2.03 53 .108 1450 63 3075 1480 7.35 12.16 11.04 .53 3000 1.15 60 .059 1555 36 2200 950 10.75 21.45 15.23 .36 2690 1.6 60 .032 1773 27 4300 600 10.5 7.86 12.3 .59 5100 2.2 50 .11 1530 66 2500 1520 9.9 7.14 17.1 1.69 3700 .43 97 .092 419.5 26.8 907 430 5.95 7.3 27 1.88 2600 .55 57 .05 231.9 14.2 2270 620 11.3 9.8 13.4 2.92 1800 .32 8.6 .029 271.3 14.1 1490 200 118]; for i = 1:12 mi = min(X(:,i)); ma = max(X(:,i)); X(:,i) = (X(:,i)-mi)/(ma - mi); end msize = [6 6]; lattice = 'hexa'; % hexagonal lattice neigh = 'gaussian'; % neighborhood function radius_coarse = [4 .5]; % [initial final] trainlen_coarse = 50; % cycles in coarse training radius_fine = [.5 .5]; % [initial final] trainlen_fine = 10; % cycles in fine training smI = som_lininit(X, 'msize', msize, 'lattice', lattice, 'shape', ... 'sheet'); smC = som_batchtrain(smI, X, 'radius', radius_coarse, 'trainlen', ...
  • 11. 11 trainlen_coarse, 'neigh', neigh); som_cplane('hexa',msize, 'none') sm = som_batchtrain(smC, X, 'radius', radius_fine, 'trainlen', ... trainlen_fine, 'neigh', neigh); labels1 = 'ASAICCACPMNPPFZSB'; labels2 = 'lbgrdouubgidtenni'; M = sm.codebook; norms2 = sum(M.*M,2); op_mat=zeros(17,2); t=1; % input serial no. for u=1:17 X1 = X(u,:)'; Y = norms2 - 2*M*X1; [C,c] = min(Y); op_mat(u,1)=t; op_mat(u,2)=c; ch = mod(c-1,6) + 1; cv = floor((c-1)/6) + 1; if mod(cv,2) == 1 shift1 = -.15; else shift1 = .35; end if u==9 || u==11 shift2 = -.3; elseif u==5 || u==14 shift2 = .3; else shift2 = 0; end % text(ch+shift1,cv+shift2,[labels1(u) ... % labels2(u)], 'FontSize',15); text(ch+shift1,cv+shift2,[labels1(u) ... labels2(u)], 'FontSize',15); t=t+1; end Now, if 17 different elements are given from the periodic table then it should be arranged in 17 different clusters. But the result gave a surprise to all of us. There were some elements in single clusters telling us that there was something similar about those elements.
  • 12. 12 Have a look at the output:- Fig 3.1 This is the cluster formed after the SOM tool worked with the input given to it. Some of the elements were present in the same clusters due to similar properties. You can see that that Nickel, Cobalt and Iron are in the same clusters since they have same ferromagnetic properties. Also Lead and Cadmium are in the same cluster. 3.2 Commands in MATLAB Now in the above code there are some commands used in MATLAB which help in the clustering of this data. Few of them are:-  Min - this finds the minimum value in the vector/matrix.  Max - this finds the minimum value in the vector/matrix.  som_lininit - this initializes the map after calculating the Eigen vectors.  som_batchtrain - trains the given som with the given training data.  som_cplane - this visualizes a 2D plane or U matrix.  sum - adds all the array elements.  Mod - gives the remainder.  Floor - rounds off the number to the lower value.  Text - creates text object at the axes. There is another thing in the tool that is the neighbourhood which we can change according to our need and data input. SOM works on the basis of neurons. When weights of neurons change accordingly our output changes. Let me show this with the help of figures:-
  • 13. 13 Fig 3.2 The red dots are the neurons and three black dots on the either sides are the input sequence. Fig 3.3 When the upper sequence is selected then the neurons move to that side by calculating the distance and they move to sequence have the shorter distance. And accordingly the weight of neurons is changed.
  • 14. 14 Fig 3.4 Now the other sequence was selected and the neurons changed it weights. So every time an input sequence is selected and the weights of neurons gets changed. At the end we have the final weight of neuron. Now what is this weight of neuron. This weight is the machine learning process of the machine. We give them data and there is a neuron with them which learns the pattern and do the clustering accordingly. This is the basic principle of artificial intelligence. 3.3 SOM Array Size Maybe it is necessary to state first that the SOM is visualizing the the entire input-data space, whereupon its density function ought to become clearly visible. The SOM is a quantizing method. Assume next that we have enough statistical data items to visualize the clustering structures of the data space with sufficient accuracy. Then it should be realized that the SOM is a quantizing method, and has a limited spatial resolution to show the details of the clusters. Sometimes the data set may contain only few clusters, whereupon a coarse resolution is sufficient. However, if one suspects that there are interesting fine structures in the data, then a larger array would be needed for sufficient resolution. Histograms can be displayed on the SOM array. However, it is also necessary to realize that the SOM can be used to represent a histogram. The number of input data items that is mapped onto a node is displayed as a shade of gray, or by a pseudo-color. The statistical accuracy of such a histogram depends on how many input items are mapped per node on the average . A very coarse rule-of-thumb may be that about 50 input-data items per node on the average should be sufficient, otherwise the resolution is limited by the sparsity of data. So, in visualizing clusters, a compromise must be made between resolution and statistical accuracy. These aspects should be taken into account especially in statistical studies, where only a limited number of samples are available.
  • 15. 15 Sizing the SOM by a trial-and-error method. It is not possible to estimate or even guess the exact size of the array beforehand. It must be determined by the trial-and-error method, after seeing the quality of the first guess. One may have to test several sizes of the SOM to check that the cluster structures are shown with a sufficient resolution and statistical accuracy. Typical SOM arrays range from a few dozen to a few hundred nodes. In special problems, such as the mapping of documents onto the SOM array, even larger maps with, say, thousands of nodes, are used. The largest map produced by us has been the SOM of seven million patent abstracts, for which we constructed a one-million-node SOM. On the other hand, the SOM may be at its best in the visualization of industrial processes, where unlimited amounts of measurements can be recorded. Then the size of the SOM array is not limited by the statistical accuracy but by the computational resources, especially if the SOM has to be constructed periodically in real time, like in the control rooms of factories. 3.4 SOM Array Shape Because the SOM is trying to represent the distribution of high-dimensional data items by a two-dimensional projection image, it may be understandable that the scales of the horizontal and vertical directions of the SOM array should approximately comply with the extensions of the input-data distribution in the two principal dimensions, namely, those two orthogonal directions in which the variances of the data are largest. In complete SOM software packages there is usually an auxiliary function that makes a traditional two-dimensional image of a high- dimensional distribution, e.g., the Sammon projection (cf., e.g. [39] in our SOM Toolbox program package. From its main extensions one can estimate visually what the approximate ratio of the horizontal and vertical sides of the SOM array should be. Special shapes of the array. There exist SOMs in which the array has not been selected as a rectangular sheet. Its topology may resemble, e.g., a cylinder, torus, or a sphere (cf., e.g., [80]). There also exist special SOMs in which the structure and number of nodes of the array is determined dynamically, depending on the received data; cf., e.g., [18]. The special topologies, although requiring more cumbersome displays, may sometimes be justified, e.g., for the following reasons. 1. The SOM is sometimes used to define the control conditions in industrial processes or machineries automatically, directly controlling the actuators. A problem may occur with the boundaries of the SOM sheet: there are distortions and discontinuities, which affect the control stability. The toroidal topology seems to solve this problem, because there are then no boundaries in the SOM. A similar effect is obtained by the spherical topology of the SOM. (Cf. Subsections.4.5 and 5.3, however.) 2. There may exist data, which are cyclic by their nature. One may think, for example of the application of the SOM in musicology, where the degrees of the scales repeat by octaves. Either the cylindrical or toroidal topology will then map the tones cyclically onto the SOM. The dynamical topology, which adjusts itself to structured data, is very interesting in itself. There is one particular problem, however: one must be able to define the condition on which a new structure (branching or cutting of the SOM network) is due. There do not exist universal conditions of this type, and any numerical limit can only be defined arbitrarily. Accordingly, the generated structure is then not unique. This same problem is encountered in other neural network models.
  • 16. 16 4. SOM Algorithm The first SOMs were constructed by a stepwise-recursive learning algorithm, where, at each step, a selected patch of models in the SOM array was tuned towards the given input item, one at a time. Consider again Fig. 2. Let the input data items X this time represent a sequence {x(t)} of real n-dimensional Euclidean vectors x, where t, an integer, signifies a step in the sequence. Let theMi, being variable, successively attain the values of another sequence {mi(t)} of n- dimensional real vectors that represent the successively computed approximations of model mi. Here i is the spatial index of the node with which mi is associated. The original SOM algorithm assumes that the following process converges and produces the wanted ordered values for the models: mi(t + 1) = mi(t) + hci(t)[x(t) −mi(t)] , (1) where hci(t) is called the neighborhood function. The neighborhood function has the most central role in self organization. This function resembles the kernel that is applied in usual smoothing processes. However, in the SOM, the subscript c is the index of a particular node (winner) in the array, namely, the one with the model mc(t) that has the smallest Euclidean distance from x(t): c = argmini {||x(t) −mi(t)||} . (2) Equations (1) and (2) can be illustrated as defining a recursive step where first the input data item x(t) defines or selects the best-matching model (winner) according to Eq.(2). Then, according to Eq.(1), the model at this node as well as at its spatial neighbors in the array are modified. The modifications always take place in such a direction that the modified models will match better with the input. The rates of the modifications at different nodes depend on the mathematical form of the function hci(t). A much-applied choice for the neighborhood function hci(t) is hci(t) = α(t) exp[−sqdist(c, i)/2σ2(t)] , (3) where α(t) < 1 is a monotonically (e.g., hyperbolically, exponentially, or piecewise linearly) decreasing scalar function of t, sqdist(c, i) is the square of the geometric distance between the nodes c and i in the array, and σ(t) is another monotonically decreasing function of t, respectively. The true mathematical form of σ(t) is not crucial, as long as its value is fairly large in the beginning of the process, say, on the order of 20 per cent of the longer side of the SOM array, after which it is gradually reduced to a small fraction of it, usually in a few thousand steps. The topographic order is developed during this period. On the other hand, after this initial phase of coarse ordering, the final convergence to nearly optimal values of the models takes place, say, in an order of magnitude more steps, whereupon α(t) attains values on the order of .01. For a sufficient statistical accuracy, every model must be updated sufficiently often. However, we must give a warning: the final value of σ shall never go to zero, because otherwise the process loses its ordering power. It should always remain, say, above half of the array spacing. In very large SOM arrays, the final value of σ may be on the order of five per cent of the shorter side of the array. There are also other possible choices for the mathematical form of hci(t). One of them, the ”bubble” form, is very simple; in it we have hci = 1 up to a certain radius from the winner, and zero otherwise.
  • 17. 17 4.1 Stable state of the learning process In the stationary state of learning, every model is the average of input items projected into its neighborhood, weighted by the neighborhood function. Assuming that the convergence to some stable state of the SOM is true, we require that the expectation values of mi(t + 1) and mi(t) for t→∞ must be equal, while hci is nonzero, where c = c(x(t)) is the index of the winner node for input x(t). In other words we must have ∀i, Et{hci(x(t) −mi(t))} = 0 . (4) Here Et is the mathematical expectation value operator over t. In the assumed asymptotic state, for t → ∞, the mi(t) are independent of t and are denoted by m∗ i . If the expectation values Et(.) are written, for t → ∞, as (1/t)_t(.), we can write m∗i = _t hci(t)x(t) _t hci(t). (5) This, however, is still an implicit expression, since c depends on x(t) and the mi, and must be solved iteratively. Nonetheless, Eq.(5) shall be used for the motivation of the iterative solution for the mi, known as the batch computation of the SOM (”Batch Map”). 4.2 Initialization of the models The learning process can be started with random vectors as the initial values of the model vectors, but learning is sped up significantly, if certain regular initial values are given to the models. A special question concerns the selection of the initial values for the mi. It has been demonstrated by [39] that they can be selected even as random vectors, but a significantly faster convergence follows if the initial values constitute a regular, two-dimensional sequence of vectors taken along a hyperplane spanned by the two largest principal components of x (i.e., the principal components associated with the two highest eigenvalues); cf. [39]. This method is called linear initialization. The initialization of the models as random vectors was originally used only to demonstrate the capability of the SOM to become ordered, starting from an arbitrary initial state. In practical applications one expects to achieve the final ordering as quickly as possible, so the selection of a good initial state may speed up the convergence of the algorithms by orders of magnitude. 4.3 Point density of the models (one-dimensional case) It was stated in Subsec. 4.1 that in the stationary state of learning, every model vector is the average of input items projected into its neighborhood, weighted by the neighborhood function. However, this condition does not yet tell anything about the distribution of the model vectors, or their point density.
  • 18. 18 To clarify what is thereby meant, we have to revert to the classical vector quantization, or the k-means algorithm [19], [20], which differs from the SOM in that only the winners are updated in training; in other words, the ”neighborhood function” hci in k-means learning is equal to δci, where δci = 1, if c = i, and δci = 0, if c _= i. No topographic order of the models is produced in the classical vector quantization, but its mathematical theory is well established. In particular, it has been shown that the point density q(x) of its model vectors depends on the probability density function p(x) of the input vectors such that (in the Euclidean metric) q(x) = C · p(x)1/3 , where C is a scalar constant. No similar result has been derived for general vector dimensionalities in the SOM. In the case that (i) when the input items are scalar-valued, (ii) when the SOM array is linear, i.e., a one-dimensional chain, (iii) when the neighborhood function is a box function with N neighbors on each side of the winner, and (iv) if the SOM contains a very large number of model vectors over a finite range, Ritter and Schulten [74] have derived the following formula, where C is some constant: q(x) = C · p(x)r , where r = 2/3 − 1/(32 + 3(N + 1)2) . For instance, when N = 1 (one neighbor on each side of the winner), we have r = 0.60. For Gaussian neighborhood functions, Dersch and Tavan have derived a similar result [15]. In other words, the point density of the models in the display is usually proportional to the probability density function of the inputs, but not linearly: it is flatter. This phenomenon is not harmful; as a matter it means that the SOM display is more uniform than it would be if it represented the exact probability density of the inputs. 4.4 Border effects of the SOM Another salient effect in the SOM, namely, in the planar sheet topology of the array, is that the point density of the models at the borders of the array is a little distorted. This effect is illustrated in the case in which the probability density of input is constant in a certain domain (support of p(x)) and zero outside it. Consider again first the one-dimensional input and linear SOM array. In Fig. 4.1 we show a one-dimensional domain (support) [a, b], over which the probability density function p(x) of a scalar-valued input variable x is constant. The inputs to the SOM algorithm are picked up from this support at random. The set of ordered scalar-valued models μi of the resulting one-dimensional SOM has been rendered on the x axis. Fig. 4.1 Ordered model values μi over a one-dimensional domain.
  • 19. 19 Fig. 4.2. Converged model values μi over a one-dimensional domain of different lengths. Numerical values of the μi for different lengths of the SOM array are shown in Fig.4.2. It is discernible that the μi in the middle of the array are reasonably evenly spaced, but close to the borders the first distance from the border is bigger, the second spacing is smaller, and the next spacing is again larger than the average. This effect is explained by Fig. 3: in equilibrium, in the middle of the array, every μi must coincide with the centroid of set Si, where Si represents all values of x that will be mapped to node i or its neighbors i − 1 and i + 1. However, at the borders, all values of x that are mapped to node 1 will be in the interval ranging from a to (μ2 + μ3)/2, and those values of x that will be mapped to node 2 range from a to (μ3 + μ4)/2. Similar relations can be shown to hold near the end of the chain for μl−1 and μl. Clearly the definition of the above intervals is unsymmetric in the middle of the array and at the borders, which makes the spacings different.
  • 20. 20 5. Experiments during training 5.1 Basic Code of Clustering of Random Colours Now with the baby steps first a code was tried in which there were 10,000 colour vectors. They were all randomly generated. After that the input was given to the SOM tool. Now there are two vector values that we have to give so that the SOM can calculate the distance and keep it in the required cluster. These are radius_coarse and radius_fine. In radius_coarse, we given the radius from where it will start to calculate and check the neighbours by similar properties and put it in the desired cluster. Then we have to give the radius_fine in which it has the final radius values. So in all we have given the bigger circle to check neighbours and then reduce it to find neighbours. All this is given by the user according to the needs. close clear all clc msize = [25 25]; lattice = 'rect'; radius_coarse = [10 7]; radius_fine = [7 5]; trainlen_coarse = 50; trainlen_fine = 50; X = rand(10000,3); % random input (training) vectors to the SOM smI = som_lininit(X,'msize',msize,'lattice',lattice,'shape', ... 'sheet'); smC = som_batchtrain(smI,X,'radius',radius_coarse,'trainlen', ... trainlen_coarse, 'neigh','gaussian'); sm = som_batchtrain(smC,X,'radius',radius_fine,'trainlen', ... trainlen_fine, 'neigh','gaussian'); M = sm.codebook; C = zeros(25,25,3); for i = 1:25 for j = 1:25 p = 25*(i-1) + j; C(i,j,1) = M(p,1); C(i,j,2) = M(p,2); C(i,j,3) = M(p,3); end end image(C)
  • 21. 21 Fig 5.1. In this the colours of same shade are near to each other and clusters are made accordingly. 5.2 Hyperspectral Analysis of Soil In this we had a hyperspectral data of 25 different soils. It had spectral values of different soils at various wavelengths. We chose three different range of wavelengths – Visible, Infrared, Short wave Infrared. Aim: To take a range of spectral values and perform a SOM on it to make a spectral library. Procedure: For identifying any random value that which soil is that we added noise in the data of different SNR and check that which noise is giving the result equivalent to the original soil value. 5.2.1 Soil Type – I Very Dark Grayish Brown Silty Loam 5.2.1.1 Visible Range The visible spectrum is the portion of the electromagnetic spectrum that is visible to the human eye. Electromagnetic radiation in this range of wavelengths is called visible light or simply light. A typical human eye will respond to wavelengths from about 390 to 700 nm. In terms of frequency, this corresponds to a band in the vicinity of 430–770 THz. The spectrum does not, however, contain all the colors that the human eyes and brain can distinguish. Unsaturated colors such as pink, or purple variations such as magenta, are absent, for example, because they can be made only by a mix of multiple wavelengths. Colors containing only one wavelength are also called pure colors or spectral colors.
  • 22. 22 Visible wavelengths pass through the "optical window", the region of the electromagnetic spectrum that allows wavelengths to pass largely unattenuated through the Earth's atmosphere. An example of this phenomenon is that clean air scatters blue light more than red wavelengths, and so the midday sky appears blue. The optical window is also referred to as the "visible window" because it overlaps the human visible response spectrum. The near infrared (NIR) window lies just out of the human vision, as well as the Medium Wavelength IR (MWIR) window, and the Long Wavelength or Far Infrared (LWIR or FIR) window, although other animals may experience them. Colors that can be produced by visible light of a narrow band of wavelengths (monochromatic light) are called pure spectral colors. The various color ranges indicated in the illustration are an approximation: The spectrum is continuous, with no clear boundaries between one color and the next. This was a small introduction about Visible range of wavelength. When choosing the visible range we got a range of spectrum and then we experimented it with addition of white Gaussian noise in the spectrum. Now, What is White Gaussian Noise? Additive white Gaussian noise (AWGN) is a basic noise model used in Information theory to mimic the effect of many random processes that occur in nature. The modifiers denote specific characteristics:  Additive because it is added to any noise that might be intrinsic to the information system.  White refers to the idea that it has uniform power across the frequency band for the information system. It is an analogy to the color white which has uniform emissions at all frequencies in the visible spectrum.  Gaussian because it has a normal distribution in the time domain with an average time domain value of zero. Wideband noise comes from many natural sources, such as the thermal vibrations of atoms in conductors (referred to as thermal noise or Johnson-Nyquist noise), shot noise,black body
  • 23. 23 radiation from the earth and other warm objects, and from celestial sources such as the Sun. The central limit theorem of probability theory indicates that the summation of many random processes will tend to have distribution called Gaussian or Normal. AWGN is often used as a channel model in which the only impairment to communication is a linear addition of wideband or white noise with a constant spectral density(expressed as watts per hertz of bandwidth) and a Gaussian distribution of amplitude. The model does not account for fading, frequency selectivity, interference, nonlinearity ordispersion. However, it produces simple and tractable mathematical models which are useful for gaining insight into the underlying behavior of a system before these other phenomena are considered. The AWGN channel is a good model for many satellite and deep space communication links. It is not a good model for most terrestrial links because of multipath, terrain blocking, interference, etc. However, for terrestrial path modeling, AWGN is commonly used to simulate background noise of the channel under study, in addition to multipath, terrain blocking, interference, ground clutter and self interference that modern radio systems encounter in terrestrial operation. Code for addition of Additive White Gaussian Noise 1. For SNR 60 close all clear all clc path(path,'F:TraineeDakshsoil3'); A=textread('soil400.txt'); noise=awgn(A,60) 2. For SNR 58 close all clear all clc path(path,'F:TraineeDakshsoil3'); A=textread('soil400.txt'); noise=awgn(A,58) 3. For SNR 56 close all clear all clc path(path,'F:TraineeDakshsoil3'); A=textread('soil400.txt'); noise=awgn(A,56)
  • 24. 24 4. For SNR 54 dB close all clear all clc path(path,'F:TraineeDakshsoil3'); A=textread('soil400.txt'); noise=awgn(A,54) After adding different level of SNR, we get a plot of five different spectral values. Fig 5.2. Different spectral values of single type of soil (Very dark grayish brown silty loam). After finding the different spectral values of different SNR, we applied the same clustering method on it. Code for clustering close all clear all clc path(path,'F:TraineeDakshsoil3'); ATRANS=textread('soil400clus.txt'); X=ATRANS'; for i = 1:301 mi = min(X(:,i));
  • 25. 25 ma = max(X(:,i)); X(:,i) = (X(:,i)-mi)/(ma - mi); end msize = [4 4]; lattice = 'hexa'; % hexagonal lattice neigh = 'gaussian'; % neighborhood function radius_coarse = [4 .5]; % [initial final] trainlen_coarse = 50; % cycles in coarse training radius_fine = [.5 .5]; % [initial final] trainlen_fine = 10; % cycles in fine training smI = som_lininit(X, 'msize', msize, 'lattice', lattice, 'shape', ... 'sheet'); smC = som_batchtrain(smI, X, 'radius', radius_coarse, 'trainlen', ... trainlen_coarse, 'neigh', neigh); som_cplane('rect',msize, 'none') sm = som_batchtrain(smC, X, 'radius', radius_fine, 'trainlen', ... trainlen_fine, 'neigh', neigh); labels1 = 'SSSSSSS'; labels2 = '1234567'; M = sm.codebook; norms2 = sum(M.*M,2); op_mat=zeros(7,2); t=1; % input serial no. for u=1:3 X1 = X(u,:)'; Y = norms2 - 2*M*X1; [C,c] = min(Y); op_mat(u,1)=t; op_mat(u,2)=c; ch = mod(c-1,4) + 1; cv = floor((c-1)/4) + 1; if mod(cv,2) == 1 shift1 = -.15; else shift1 = .35; end if u==9 || u==11 shift2 = -.3; elseif u==5 || u==14 shift2 = .3; else shift2 = 0; end % text(ch+shift1,cv+shift2,[labels1(u) ... % labels2(u)], 'FontSize',15); text(ch,cv+shift2,[labels1(u) ... labels2(u)], 'FontSize',15); t=t+1; end
  • 26. 26 Fig 5.3. Cluster for Soil Type 1 Conclusion: In the above figure, S1 is the original value and rest are the other values with added noise. We can see that S2 and S5 are near to S1 hence having same properties. 5.2.1.2 Near Infrared Near-infrared spectroscopy (NIRS) is a spectroscopic method that uses the near- infrared region of the electromagnetic spectrum (from about 700 nm to 2500 nm). Typical applications include medical and physiological diagnostics and research including blood sugar, pulse oximetry, functional neuroimaging, sports medicine, elite sports training, ergonomics, rehabilitation, neonatal research, brain computer interface, urology (bladder contraction), and neurology (neurovascular coupling). There are also applications in other areas as well such as pharmaceutical, food and agrochemical quality control, atmospheric chemistry, and combustion research. Near-infrared spectroscopy is based on molecular overtone and combination vibrations. Such transitions are forbidden by the selection rules of quantum mechanics. As a result, the molar absorptivity in the near-IR region is typically quite small. One advantage is that NIR can typically penetrate much farther into a sample than mid infrared radiation. Near-infrared spectroscopy is, therefore, not a particularly sensitive technique, but it can be very useful in probing bulk material with little or no sample preparation. The molecular overtone and combination bands seen in the near-IR are typically very broad, leading to complex spectra; it can be difficult to assign specific features to specific chemical components. Multivariate (multiple variables) calibration techniques (e.g., principal
  • 27. 27 components analysis, partial least squares, or artificial neural networks) are often employed to extract the desired chemical information. Careful development of a set of calibration samples and application of multivariate calibration techniques is essential for near-infrared analytical methods. Code for addition of Additive White Gaussian Noise 1. For SNR 60 close all clear all clc path(path,'F:TraineeDakshsoil3'); A=textread('soil400.txt'); noise=awgn(A,60) 2. For SNR 58 close all clear all clc path(path,'F:TraineeDakshsoil3'); A=textread('soil400.txt'); noise=awgn(A,58) 3. For SNR 56 close all clear all clc path(path,'F:TraineeDakshsoil3'); A=textread('soil400.txt'); noise=awgn(A,56) 4. For SNR 54 dB close all clear all clc path(path,'F:TraineeDakshsoil3'); A=textread('soil400.txt'); noise=awgn(A,54) After adding different level of SNR, we get a plot of five different spectral values.
  • 28. 28 Fig 5.4 Different spectral values of single type of soil (Very dark grayish brown silty loam). After finding the different spectral values of different SNR, we applied the same clustering method on it. Code for clustering close all clear all clc path(path,'C:UsersDTRLDesktopsoil'); ATRANS=textread('noiseclus700.txt'); X=ATRANS'; for i = 1:251 mi = min(X(:,i)); ma = max(X(:,i)); X(:,i) = (X(:,i)-mi)/(ma - mi); end msize = [4 4]; lattice = 'hexa'; % hexagonal lattice neigh = 'gaussian'; % neighborhood function radius_coarse = [4 .5]; % [initial final] trainlen_coarse = 50; % cycles in coarse training radius_fine = [.5 .5]; % [initial final] trainlen_fine = 10; % cycles in fine training smI = som_lininit(X, 'msize', msize, 'lattice', lattice, 'shape', ... 'sheet'); smC = som_batchtrain(smI, X, 'radius', radius_coarse, 'trainlen', ...
  • 29. 29 trainlen_coarse, 'neigh', neigh); som_cplane('rect',msize, 'none') sm = som_batchtrain(smC, X, 'radius', radius_fine, 'trainlen', ... trainlen_fine, 'neigh', neigh); labels1 = 'SSSSSSS'; labels2 = '1234567'; M = sm.codebook; norms2 = sum(M.*M,2); op_mat=zeros(7,2); t=1; % input serial no. for u=1:7 X1 = X(u,:)'; Y = norms2 - 2*M*X1; [C,c] = min(Y); op_mat(u,1)=t; op_mat(u,2)=c; ch = mod(c-1,4) + 1; cv = floor((c-1)/4) + 1; if mod(cv,2) == 1 shift1 = -.15; else shift1 = .35; end if u==9 || u==11 shift2 = -.3; elseif u==5 || u==14 shift2 = .3; else shift2 = 0; end % text(ch+shift1,cv+shift2,[labels1(u) ... % labels2(u)], 'FontSize',15); text(ch,cv+shift2,[labels1(u) ... labels2(u)], 'FontSize',15); t=t+1; end op_mat
  • 30. 30 Fig 5.5 Clustering for Soil Type I Conclusion: In the above figure, S1 is the original value and rest are the other values with added noise. We can see that S2 and S6 are near to S1 hence having same properties. 5.2.1.3 Shortwave Infrared Short-wave infrared (SWIR) light is typically defined as light in the 0.9 – 1.7μm wavelength range, but can also be classified from 0.7 – 2.5μm. Since silicon sensors have an upper limit of approximately 1.0μm, SWIR imaging requires unique opticaland electronic components capable of performing in the specific SWIR range. Indium gallium arsenide (inGaAs) sensors are the primary sensors used in SWIR imaging, covering the typical SWIR range, but can extend as low as 550nm to as high as 2.5μm. Although linear line-scan inGaAs sensors are commercially available, area-scan inGaAs sensors are typically ITAR restricted. ITAR, International Treaty and Arms Regulations, is enforced by the government of the United States of America. ITAR restricted products must adhere to strict export and import regulations for them to be manufactured and/or sold within and outside of the USA. Nevertheless, lenses such as SWIR ones can be used for a number of commercial applications with proper licenses.
  • 31. 31 Code for addition of Additive White Gaussian Noise 1. For SNR 60 close all clear all clc path(path,'F:TraineeDakshsoil3'); A=textread('soil400.txt'); noise=awgn(A,60) 2. For SNR 58 close all clear all clc path(path,'F:TraineeDakshsoil3'); A=textread('soil400.txt'); noise=awgn(A,58) 3. For SNR 56 close all clear all clc path(path,'F:TraineeDakshsoil3'); A=textread('soil400.txt'); noise=awgn(A,56) 4. For SNR 54 dB close all clear all clc path(path,'F:TraineeDakshsoil3'); A=textread('soil400.txt'); noise=awgn(A,54) After adding different level of SNR, we get a plot of five different spectral values.
  • 32. 32 Fig 5.6 Different spectral values of single type of soil (Very dark grayish brown silty loam). After finding the different spectral values of different SNR, we applied the same clustering method on it. Code for clustering close all clear all clc path(path,'C:UsersDTRLDesktopsoil2'); ATRANS=textread('soil1400clus.txt'); X=ATRANS'; for i = 1:424 mi = min(X(:,i)); ma = max(X(:,i)); X(:,i) = (X(:,i)-mi)/(ma - mi); end msize = [4 4]; lattice = 'hexa'; % hexagonal lattice neigh = 'gaussian'; % neighborhood function radius_coarse = [4 .5]; % [initial final] trainlen_coarse = 50; % cycles in coarse training radius_fine = [.5 .5]; % [initial final] trainlen_fine = 10; % cycles in fine training smI = som_lininit(X, 'msize', msize, 'lattice', lattice, 'shape', ... 'sheet'); smC = som_batchtrain(smI, X, 'radius', radius_coarse, 'trainlen', ... trainlen_coarse, 'neigh', neigh);
  • 33. 33 som_cplane('rect',msize, 'none') sm = som_batchtrain(smC, X, 'radius', radius_fine, 'trainlen', ... trainlen_fine, 'neigh', neigh); labels1 = 'SSSSSSS'; labels2 = '1234567'; M = sm.codebook; norms2 = sum(M.*M,2); op_mat=zeros(7,2); t=1; % input serial no. for u=1:5 X1 = X(u,:)'; Y = norms2 - 2*M*X1; [C,c] = min(Y); op_mat(u,1)=t; op_mat(u,2)=c; ch = mod(c-1,4) + 1; cv = floor((c-1)/4) + 1; if mod(cv,2) == 1 shift1 = -.15; else shift1 = .35; end if u==9 || u==11 shift2 = -.3; elseif u==5 || u==14 shift2 = .3; else shift2 = 0; end % text(ch+shift1,cv+shift2,[labels1(u) ... % labels2(u)], 'FontSize',15); text(ch,cv+shift2,[labels1(u) ... labels2(u)], 'FontSize',15); t=t+1; end op_mat
  • 34. 34 Fig 5.7. Cluster for Soil Type I Conclusion: In the above figure, S1 is the original value and rest are the other values with added noise. We can see that S2 and S3 are near to S1 hence having same properties. 5.2.2 Soil Type – II Grayish Brown Loam 5.2.2.1 Visible Range Code for addition of Additive White Gaussian Noise 1. For SNR 60 close all clear all clc path(path,'F:TraineeDakshsoil3'); A=textread('soil400.txt'); noise=awgn(A,60) 2. For SNR 58 close all clear all clc path(path,'F:TraineeDakshsoil3'); A=textread('soil400.txt'); noise=awgn(A,58)
  • 35. 35 3. For SNR 56 close all clear all clc path(path,'F:TraineeDakshsoil3'); A=textread('soil400.txt'); noise=awgn(A,56) 4. For SNR 54 dB close all clear all clc path(path,'F:TraineeDakshsoil3'); A=textread('soil400.txt'); noise=awgn(A,54) After adding different level of SNR, we get a plot of five different spectral values. Fig 5.8. Different spectral values of single type of soil (Grayish brown loam).
  • 36. 36 After finding the different spectral values of different SNR, we applied the same clustering method on it. Code for clustering close all clear all clc path(path,'C:UsersDTRLDesktopsoil2'); ATRANS=textread('soil400clus.txt'); X=ATRANS'; for i = 1:301 mi = min(X(:,i)); ma = max(X(:,i)); X(:,i) = (X(:,i)-mi)/(ma - mi); end msize = [4 4]; lattice = 'hexa'; % hexagonal lattice neigh = 'gaussian'; % neighborhood function radius_coarse = [4 .5]; % [initial final] trainlen_coarse = 50; % cycles in coarse training radius_fine = [.5 .5]; % [initial final] trainlen_fine = 10; % cycles in fine training smI = som_lininit(X, 'msize', msize, 'lattice', lattice, 'shape', ... 'sheet'); smC = som_batchtrain(smI, X, 'radius', radius_coarse, 'trainlen', ... trainlen_coarse, 'neigh', neigh); som_cplane('rect',msize, 'none') sm = som_batchtrain(smC, X, 'radius', radius_fine, 'trainlen', ... trainlen_fine, 'neigh', neigh); labels1 = 'SSSSSSS'; labels2 = '1234567'; M = sm.codebook; norms2 = sum(M.*M,2); op_mat=zeros(7,2); t=1; % input serial no. for u=1:7 X1 = X(u,:)'; Y = norms2 - 2*M*X1; [C,c] = min(Y); op_mat(u,1)=t; op_mat(u,2)=c; ch = mod(c-1,4) + 1; cv = floor((c-1)/4) + 1; if mod(cv,2) == 1 shift1 = -.15; else shift1 = .35; end
  • 37. 37 if u==9 || u==11 shift2 = -.3; elseif u==5 || u==14 shift2 = .3; else shift2 = 0; end % text(ch+shift1,cv+shift2,[labels1(u) ... % labels2(u)], 'FontSize',15); text(ch,cv+shift2,[labels1(u) ... labels2(u)], 'FontSize',15); t=t+1; end op_mat Fig 5.9. Clustering for Soil Type II Conclusion: In the above figure, S1 is the original value and rest are the other values with added noise. We can see that none is near to S1 hence all are different.
  • 38. 38 5.2.2.2 Near Infrared Code for addition of Additive White Gaussian Noise 1. For SNR 60 close all clear all clc path(path,'F:TraineeDakshsoil3'); A=textread('soil400.txt'); noise=awgn(A,60) 2. For SNR 58 close all clear all clc path(path,'F:TraineeDakshsoil3'); A=textread('soil400.txt'); noise=awgn(A,58) 3. For SNR 56 close all clear all clc path(path,'F:TraineeDakshsoil3'); A=textread('soil400.txt'); noise=awgn(A,56) 4. For SNR 54 dB close all clear all clc path(path,'F:TraineeDakshsoil3'); A=textread('soil400.txt'); noise=awgn(A,54)
  • 39. 39 After adding different level of SNR, we get a plot of five different spectral values. Fig 5.10. Different spectral values of single type of soil (Grayish brown loam). After finding the different spectral values of different SNR, we applied the same clustering method on it. Code for clustering close all clear all clc path(path,'C:UsersDTRLDesktopsoil2'); ATRANS=textread('soil700clus.txt'); X=ATRANS'; for i = 1:185 mi = min(X(:,i)); ma = max(X(:,i)); X(:,i) = (X(:,i)-mi)/(ma - mi); end msize = [4 4]; lattice = 'hexa'; % hexagonal lattice neigh = 'gaussian'; % neighborhood function radius_coarse = [4 .5]; % [initial final] trainlen_coarse = 50; % cycles in coarse training radius_fine = [.5 .5]; % [initial final] trainlen_fine = 10; % cycles in fine training smI = som_lininit(X, 'msize', msize, 'lattice', lattice, 'shape', ... 'sheet'); smC = som_batchtrain(smI, X, 'radius', radius_coarse, 'trainlen', ... trainlen_coarse, 'neigh', neigh);
  • 40. 40 som_cplane('rect',msize, 'none') sm = som_batchtrain(smC, X, 'radius', radius_fine, 'trainlen', ... trainlen_fine, 'neigh', neigh); labels1 = 'SSSSSSS'; labels2 = '1234567'; M = sm.codebook; norms2 = sum(M.*M,2); op_mat=zeros(7,2); t=1; % input serial no. for u=1:5 X1 = X(u,:)'; Y = norms2 - 2*M*X1; [C,c] = min(Y); op_mat(u,1)=t; op_mat(u,2)=c; ch = mod(c-1,4) + 1; cv = floor((c-1)/4) + 1; if mod(cv,2) == 1 shift1 = -.15; else shift1 = .35; end if u==9 || u==11 shift2 = -.3; elseif u==5 || u==14 shift2 = .3; else shift2 = 0; end % text(ch+shift1,cv+shift2,[labels1(u) ... % labels2(u)], 'FontSize',15); text(ch,cv+shift2,[labels1(u) ... labels2(u)], 'FontSize',15); t=t+1; end op_mat
  • 41. 41 Fig 5.11. Clustering for Soil Type II Conclusion: In the above figure, S1 is the original value and rest are the other values with added noise. We can see that S2 is near to S1 hence having same properties. 5.2.2.3 Shortwave Infrared Code for addition of Additive White Gaussian Noise 1. For SNR 60 close all clear all clc path(path,'F:TraineeDakshsoil3'); A=textread('soil400.txt'); noise=awgn(A,60) 2. For SNR 58 close all clear all clc path(path,'F:TraineeDakshsoil3'); A=textread('soil400.txt'); noise=awgn(A,58) 3. For SNR 56 close all clear all clc
  • 42. 42 path(path,'F:TraineeDakshsoil3'); A=textread('soil400.txt'); noise=awgn(A,56) 4. For SNR 54 dB close all clear all clc path(path,'F:TraineeDakshsoil3'); A=textread('soil400.txt'); noise=awgn(A,54) After adding different level of SNR, we get a plot of five different spectral values. Fig 5.12. Different spectral values of single type of soil (Grayish brown loam). After finding the different spectral values of different SNR, we applied the same clustering method on it. Code for clustering close all clear all clc path(path,'C:UsersDTRLDesktopsoil2'); ATRANS=textread('soil700clus.txt'); X=ATRANS'; for i = 1:424 mi = min(X(:,i)); ma = max(X(:,i)); X(:,i) = (X(:,i)-mi)/(ma - mi);
  • 43. 43 end msize = [4 4]; lattice = 'hexa'; % hexagonal lattice neigh = 'gaussian'; % neighborhood function radius_coarse = [4 .5]; % [initial final] trainlen_coarse = 50; % cycles in coarse training radius_fine = [.5 .5]; % [initial final] trainlen_fine = 10; % cycles in fine training smI = som_lininit(X, 'msize', msize, 'lattice', lattice, 'shape', ... 'sheet'); smC = som_batchtrain(smI, X, 'radius', radius_coarse, 'trainlen', ... trainlen_coarse, 'neigh', neigh); som_cplane('rect',msize, 'none') sm = som_batchtrain(smC, X, 'radius', radius_fine, 'trainlen', ... trainlen_fine, 'neigh', neigh); labels1 = 'SSSSSSS'; labels2 = '1234567'; M = sm.codebook; norms2 = sum(M.*M,2); op_mat=zeros(7,2); t=1; % input serial no. for u=1:5 X1 = X(u,:)'; Y = norms2 - 2*M*X1; [C,c] = min(Y); op_mat(u,1)=t; op_mat(u,2)=c; ch = mod(c-1,4) + 1; cv = floor((c-1)/4) + 1; if mod(cv,2) == 1 shift1 = -.15; else shift1 = .35; end if u==9 || u==11 shift2 = -.3; elseif u==5 || u==14 shift2 = .3; else shift2 = 0; end % text(ch+shift1,cv+shift2,[labels1(u) ... % labels2(u)], 'FontSize',15); text(ch,cv+shift2,[labels1(u) ... labels2(u)], 'FontSize',15); t=t+1; end op_mat
  • 44. 44 Fig 5.13. Clustering for Soil Type II Conclusion: In the above figure, S1 is the original value and rest are the other values with added noise. We can see that none is near to S1. Similar experiment was done with other 22 soils. 5.3 Polarimetry Analysis of Different Regions Now in this experiment we had different types of terrain like Urban, Water and Vegetated. So we decided to check the different terrain by calculating there Co and Cross Polarization. Aim: First we plotted the 3D curve for all the values of co and cross in all the different regions and see the changes in the plots. Then we did the SOM clustering and see if these values it also tells the same thing. Procedure: For the clustering we found the wavelets from the array with the help of Haar wavelet at different levels. Coefficients of wavelets were found and with them the input was given to the SOM for clustering. Theory: Cross – Polarization Cross polarized wave (XPW) generation is a nonlinear optical process that can be classified in the group of frequency degenerate [four wave mixing] processes. It can take place only in media with anisotropy of third order nonlinearity. As a result of such nonlinear optical interaction at the output of the nonlinear crystal it is generated a new linearly polarized wave
  • 45. 45 at the same frequency, but with polarization oriented perpendicularly to the polarization of input wave . Fig 5.14. Simplified optical scheme for XPW generation is shown on Fig. 5.14. It consists of a nonlinear crystal plate (thick 1-2 mm) sandwiched between two crossed polarizers. The intensity of generated XPW has cubic dependence with respect to the intensity of the input wave. In fact this is the main reason this effect is so successful for improvement of the contrast of the temporal and spatial profiles of femtosecond pulses. Since cubic crystals are used as nonlinear media they are isotropic with respect to linear properties (there is no birefringence) and because of that the phase and group velocities of both waves XPW and the fundamental wave(FW) are equal:VXPW=VFW and Vgr,XPW=Vgr,FW. Consequence of that is ideal phase and group velocity matching for the two waves propagating along the crystal. This property allows obtaining very good efficiency of the XPW generation process with minimum distortions of the pulse shape and the spectrum. Wavelets A wavelet is a wave-like oscillation with an amplitude that begins at zero, increases, and then decreases back to zero. It can typically be visualized as a "brief oscillation" like one might see recorded by a seismograph or heart monitor. Generally, wavelets are purposefully crafted to have specific properties that make them useful for signal processing. Wavelets can be combined, using a "reverse, shift, multiply and integrate" technique called convolution, with portions of a known signal to extract information from the unknown signal. For example, a wavelet could be created to have a frequency of Middle C and a short duration of roughly a 32nd note. If this wavelet was to be convolved with a signal created from the recording of a song, then the resulting signal would be useful for determining when the Middle C note was being played in the song. Mathematically, the wavelet will correlate with the signal if the unknown signal contains information of similar frequency. This concept of correlation is at the core of many practical applications of wavelet theory.
  • 46. 46 As a mathematical tool, wavelets can be used to extract information from many different kinds of data, including – but certainly not limited to – audio signals and images. Sets of wavelets are generally needed to analyze data fully. A set of "complementary" wavelets will decompose data without gaps or overlap so that the decomposition process is mathematically reversible. Thus, sets of complementary wavelets are useful in wavelet based compression/decompression algorithms where it is desirable to recover the original information with minimal loss. In formal terms, this representation is a wavelet series representation of a square-integrable function with respect to either a complete,orthonormal set of basis functions, or an overcomplete set or frame of a vector space, for the Hilbert space of square integrable functions. Haar Wavelet In mathematics, the Haar wavelet is a sequence of rescaled "square-shaped" functions which together form a wavelet family or basis. Wavelet analysis is similar to Fourier analysis in that it allows a target function over an interval to be represented in terms of anorthonormal basis. The Haar sequence is now recognised as the first known wavelet basis and extensively used as a teaching example. Fig 5.15. Haar Wavelet
  • 47. 47 The Haar sequence was proposed in 1909 by Alfréd Haar.[1] Haar used these functions to give an example of an orthonormal system for the space of square-integrable functions on the unit interval [0, 1]. The study of wavelets, and even the term "wavelet", did not come until much later. As a special case of the Daubechies wavelet, the Haar wavelet is also known as Db1. The Haar wavelet is also the simplest possible wavelet. The technical disadvantage of the Haar wavelet is that it is not continuous, and therefore not differentiable. This property can, however, be an advantage for the analysis of signals with sudden transitions, such as monitoring of tool failure in machines.[2] The Haar wavelet's mother wavelet function can be described as Its scaling function can be described as 5.3.1 Urban Region First we plotted the 3D curve for all the values of co and cross in the urban region and see the changes in the plots. Code for 3D Plot close all clear all clc path(path,'F:Traineeurbanroidata'); X=textread('urbanroi1_co.txt'); path(path,'F:Traineeurbanroidata'); A=textread('urbanroi2_co.txt'); path(path,'F:Traineeurbanroidata'); B=textread('urbanroi3_co.txt'); path(path,'F:Traineeurbanroidata'); C=textread('urbanroi4_co.txt'); path(path,'F:Traineeurbanroidata'); D=textread('urbanroi5_co.txt'); path(path,'F:Traineeurbanroidata'); E=textread('urbanroi1_cross.txt'); path(path,'F:Traineeurbanroidata'); F=textread('urbanroi2_cross.txt'); path(path,'F:Traineeurbanroidata');
  • 48. 48 G=textread('urbanroi3_cross.txt'); path(path,'F:Traineeurbanroidata'); H=textread('urbanroi4_cross.txt'); path(path,'F:Traineeurbanroidata'); I=textread('urbanroi5_cross.txt'); phi_orient=(linspace(0,180,181))'; xi_ellipt=(linspace(-45,45,91))'; subplot(5,2,1) mesh(xi_ellipt,phi_orient,X) colorbar axis([-45 45 0 180 0 1]) title('CoPolarisation Plot 1') xlabel('<-----ellipt xi ------>', 'rotation', 15) ylabel('<----orient phi ------>', 'rotation', -20) %figure subplot(5,2,2) mesh(xi_ellipt,phi_orient,A) colorbar axis([-45 45 0 180 0 1]) title('CoPolarisation Plot 2') xlabel('<-----ellipt xi ------>', 'rotation', 15) ylabel('<----orient phi ------>', 'rotation', -20) %figure subplot(5,2,3) mesh(xi_ellipt,phi_orient,B) colorbar axis([-45 45 0 180 0 1]) title('CoPolarisation Plot 3') xlabel('<-----ellipt xi ------>', 'rotation', 15) ylabel('<----orient phi ------>', 'rotation', -20) %figure subplot(5,2,4) mesh(xi_ellipt,phi_orient,C) colorbar axis([-45 45 0 180 0 1]) title('CoPolarisation Plot 4') xlabel('<-----ellipt xi ------>', 'rotation', 15) ylabel('<----orient phi ------>', 'rotation', -20) %figure subplot(5,2,5) mesh(xi_ellipt,phi_orient,D) colorbar axis([-45 45 0 180 0 1]) title('CoPolarisation Plot 5') xlabel('<-----ellipt xi ------>', 'rotation', 15) ylabel('<----orient phi ------>', 'rotation', -20) %figure subplot(5,2,6) mesh(xi_ellipt,phi_orient,E) colorbar
  • 49. 49 axis([-45 45 0 180 0 1]) title('CrossPolarisation Plot 1') xlabel('<-----ellipt xi ------>', 'rotation', 15) ylabel('<----orient phi ------>', 'rotation', -20) %figure subplot(5,2,7) mesh(xi_ellipt,phi_orient,F) colorbar axis([-45 45 0 180 0 1]) title('CrossPolarisation Plot 2') xlabel('<-----ellipt xi ------>', 'rotation', 15) ylabel('<----orient phi ------>', 'rotation', -20) %figure subplot(5,2,8) mesh(xi_ellipt,phi_orient,G) colorbar axis([-45 45 0 180 0 1]) title('CrossPolarisation Plot 3') xlabel('<-----ellipt xi ------>', 'rotation', 15) ylabel('<----orient phi ------>', 'rotation', -20) %figure subplot(5,2,9) mesh(xi_ellipt,phi_orient,H) colorbar axis([-45 45 0 180 0 1]) title('CrossPolarisation Plot 4') xlabel('<-----ellipt xi ------>', 'rotation', 15) ylabel('<----orient phi ------>', 'rotation', -20) %figure subplot(5,2,10) mesh(xi_ellipt,phi_orient,I) colorbar axis([-45 45 0 180 0 1]) title('CrossPolarisation Plot 5') xlabel('<-----ellipt xi ------>', 'rotation', 15) ylabel('<----orient phi ------>', 'rotation', -20) %figure
  • 50. 50 Fig 5.16. 3D Plot Code for finding the Wavelet Coefficient (Level 2) clear all clc close all % The current extension mode is zero-padding (see dwtmode). % Load original image. % X contains the loaded image. % Perform decomposition at level 2 % of X using db1. path(path,'F:Trainee'); X=textread('urbanroi3_cross.txt'); [c,s] = wavedec2(X,2,'haar'); sizex = size(X) sizec = size(c) val_s = s % Extract details coefficients at level 2 % in each orientation, from wavelet decomposition % structure [c,s]. D = detcoef2('c',c,s,2)
  • 51. 51 Code for clustering data at Level 2 close all clear all clc path(path,'F:Traineeurbanroidatahaar'); X=textread('totaln3.txt'); for i = 1:5 mi = min(X(:,i)); ma = max(X(:,i)); X(:,i) = (X(:,i)-mi)/(ma - mi); end msize = [4 4]; lattice = 'hexa'; % hexagonal lattice neigh = 'gaussian'; % neighborhood function radius_coarse = [4 .5]; % [initial final] trainlen_coarse = 50; % cycles in coarse training radius_fine = [.5 .5]; % [initial final] trainlen_fine = 10; % cycles in fine training smI = som_lininit(X, 'msize', msize, 'lattice', lattice, 'shape', ... 'sheet'); smC = som_batchtrain(smI, X, 'radius', radius_coarse, 'trainlen', ... trainlen_coarse, 'neigh', neigh); som_cplane('rect',msize, 'none') sm = som_batchtrain(smC, X, 'radius', radius_fine, 'trainlen', ... trainlen_fine, 'neigh', neigh); labels1 = 'CCCCC'; labels2 = '12345'; M = sm.codebook; norms2 = sum(M.*M,2); op_mat=zeros(5,2); t=1; % input serial no. for u=1:6348 X1 = X(u,:)'; Y = norms2 - 2*M*X1; [C,c] = min(Y); op_mat(u,1)=t; op_mat(u,2)=c; ch = mod(c-1,4) + 1; cv = floor((c-1)/4) + 1; if mod(cv,2) == 1 shift1 = -.15; else shift1 = .35; end if u==9 || u==11 shift2 = -.3; elseif u==5 || u==14 shift2 = .3;
  • 52. 52 else shift2 = 0; end % text(ch+shift1,cv+shift2,[labels1(u) ... % labels2(u)], 'FontSize',15); text(ch,cv+shift2,[labels1(u) ... labels2(u)], 'FontSize',15); t=t+1; end op_mat Fig 5.17. Clustering at Level 2 Conclusion: In the above figure, all are different classes (co and cross) derived from the wavelet coefficients. Now we can see that C3 and C4 are near to each other hence having same properties. Code for finding the Wavelet Coefficient (Level 3) clear all clc close all % The current extension mode is zero-padding (see dwtmode). % Load original image. % X contains the loaded image. % Perform decomposition at level 3 % of X using db1. path(path,'F:Traineeurbanroidata'); X=textread('urbanroi4_co.txt'); [c,s] = wavedec2(X,3,'haar');
  • 53. 53 sizex = size(X) sizec = size(c) val_s = s % Extract details coefficients at level 3 % in each orientation, from wavelet decomposition % structure [c,s]. D = detcoef2('c',c,s,3) Code for clustering data at Level 3 close all clear all clc path(path,'F:Traineeurbanroidatahaar'); X=textread('totaln3.txt'); for i = 1:5 mi = min(X(:,i)); ma = max(X(:,i)); X(:,i) = (X(:,i)-mi)/(ma - mi); end msize = [4 4]; lattice = 'hexa'; % hexagonal lattice neigh = 'gaussian'; % neighborhood function radius_coarse = [4 .5]; % [initial final] trainlen_coarse = 50; % cycles in coarse training radius_fine = [.5 .5]; % [initial final] trainlen_fine = 10; % cycles in fine training smI = som_lininit(X, 'msize', msize, 'lattice', lattice, 'shape', ... 'sheet'); smC = som_batchtrain(smI, X, 'radius', radius_coarse, 'trainlen', ... trainlen_coarse, 'neigh', neigh); som_cplane('rect',msize, 'none') sm = som_batchtrain(smC, X, 'radius', radius_fine, 'trainlen', ... trainlen_fine, 'neigh', neigh); labels1 = 'CCCCC'; labels2 = '12345'; M = sm.codebook; norms2 = sum(M.*M,2); op_mat=zeros(5,2); t=1; % input serial no. for u=1:1656 X1 = X(u,:)'; Y = norms2 - 2*M*X1; [C,c] = min(Y); op_mat(u,1)=t; op_mat(u,2)=c; ch = mod(c-1,4) + 1; cv = floor((c-1)/4) + 1;
  • 54. 54 if mod(cv,2) == 1 shift1 = -.15; else shift1 = .35; end if u==9 || u==11 shift2 = -.3; elseif u==5 || u==14 shift2 = .3; else shift2 = 0; end % text(ch+shift1,cv+shift2,[labels1(u) ... % labels2(u)], 'FontSize',15); text(ch,cv+shift2,[labels1(u) ... labels2(u)], 'FontSize',15); t=t+1; end op_mat Fig 5.18. Clustering at Level 3 Conclusion: In the above figure, all are different classes (co and cross) derived from the wavelet coefficients. Now we can see that C3 and C4 are near to each other hence having same properties.
  • 55. 55 Code for finding the Wavelet Coefficient (Level 4) clear all clc close all % The current extension mode is zero-padding (see dwtmode). % Load original image. % X contains the loaded image. % Perform decomposition at level 4 % of X using db1. path(path,'F:Traineeurbanroidata'); X=textread('urbanroi3_cross.txt'); [c,s] = wavedec2(X,4,'haar'); sizex = size(X) sizec = size(c) val_s = s % Extract details coefficients at level 4 % in each orientation, from wavelet decomposition % structure [c,s]. D = detcoef2('c',c,s,4) Code for clustering data at Level 4 close all clear all clc path(path,'F:Traineeurbanroidatahaar'); X=textread('totaln4.txt'); for i = 1:5 mi = min(X(:,i)); ma = max(X(:,i)); X(:,i) = (X(:,i)-mi)/(ma - mi); end msize = [4 4]; lattice = 'hexa'; % hexagonal lattice neigh = 'gaussian'; % neighborhood function radius_coarse = [4 .5]; % [initial final] trainlen_coarse = 50; % cycles in coarse training radius_fine = [.5 .5]; % [initial final] trainlen_fine = 10; % cycles in fine training smI = som_lininit(X, 'msize', msize, 'lattice', lattice, 'shape', ... 'sheet'); smC = som_batchtrain(smI, X, 'radius', radius_coarse, 'trainlen', ... trainlen_coarse, 'neigh', neigh); som_cplane('rect',msize, 'none') sm = som_batchtrain(smC, X, 'radius', radius_fine, 'trainlen', ... trainlen_fine, 'neigh', neigh); labels1 = 'CCCCC'; labels2 = '12345';
  • 56. 56 M = sm.codebook; norms2 = sum(M.*M,2); op_mat=zeros(5,2); t=1; % input serial no. for u=1:432 X1 = X(u,:)'; Y = norms2 - 2*M*X1; [C,c] = min(Y); op_mat(u,1)=t; op_mat(u,2)=c; ch = mod(c-1,4) + 1; cv = floor((c-1)/4) + 1; if mod(cv,2) == 1 shift1 = -.15; else shift1 = .35; end if u==9 || u==11 shift2 = -.3; elseif u==5 || u==14 shift2 = .3; else shift2 = 0; end % text(ch+shift1,cv+shift2,[labels1(u) ... % labels2(u)], 'FontSize',15); text(ch,cv+shift2,[labels1(u) ... labels2(u)], 'FontSize',15); t=t+1; end op_mat
  • 57. 57 Fig 5.19. Clustering at Level 4 Conclusion: In the above figure, all are different classes (co and cross) derived from the wavelet coefficients. Now we can see that C1 and C5 are near to each other hence having same properties. Code for finding the Wavelet Coefficient (Level 5) clear all clc close all % The current extension mode is zero-padding (see dwtmode). % Load original image. % X contains the loaded image. % Perform decomposition at level 5 % of X using db1. path(path,'F:Trainee'); X=textread('urbanroi5_co.txt'); [c,s] = wavedec2(X,5,'haar'); sizex = size(X) sizec = size(c) val_s = s % Extract details coefficients at level 5 % in each orientation, from wavelet decomposition % structure [c,s]. D = detcoef2('c',c,s,5)
  • 58. 58 Code for clustering data at Level 5 close all clear all clc path(path,'F:Traineeurbanroidatahaar'); X=textread('totaln5.txt'); for i = 1:5 mi = min(X(:,i)); ma = max(X(:,i)); X(:,i) = (X(:,i)-mi)/(ma - mi); end msize = [4 4]; lattice = 'hexa'; % hexagonal lattice neigh = 'gaussian'; % neighborhood function radius_coarse = [4 .5]; % [initial final] trainlen_coarse = 50; % cycles in coarse training radius_fine = [.5 .5]; % [initial final] trainlen_fine = 10; % cycles in fine training smI = som_lininit(X, 'msize', msize, 'lattice', lattice, 'shape', ... 'sheet'); smC = som_batchtrain(smI, X, 'radius', radius_coarse, 'trainlen', ... trainlen_coarse, 'neigh', neigh); som_cplane('rect',msize, 'none') sm = som_batchtrain(smC, X, 'radius', radius_fine, 'trainlen', ... trainlen_fine, 'neigh', neigh); labels1 = 'CCCCC'; labels2 = '12345'; M = sm.codebook; norms2 = sum(M.*M,2); op_mat=zeros(5,2); t=1; % input serial no. for u=1:108 X1 = X(u,:)'; Y = norms2 - 2*M*X1; [C,c] = min(Y); op_mat(u,1)=t; op_mat(u,2)=c; ch = mod(c-1,4) + 1; cv = floor((c-1)/4) + 1; if mod(cv,2) == 1 shift1 = -.15; else shift1 = .35; end if u==9 || u==11 shift2 = -.3; elseif u==5 || u==14 shift2 = .3;
  • 59. 59 else shift2 = 0; end % text(ch+shift1,cv+shift2,[labels1(u) ... % labels2(u)], 'FontSize',15); text(ch,cv+shift2,[labels1(u) ... labels2(u)], 'FontSize',15); t=t+1; end op_mat Fig 5.20. Clustering at Level 5 Conclusion: In the above figure, all are different classes (co and cross) derived from the wavelet coefficients. Now we can see that C1 and C4 are near to each other hence having same properties. Code for finding the Wavelet Coefficient (Level 6) clear all clc close all % The current extension mode is zero-padding (see dwtmode). % Load original image. % X contains the loaded image. % Perform decomposition at level 6 % of X using db1. path(path,'F:Trainee'); X=textread('urbanroi5_cross.txt'); [c,s] = wavedec2(X,6,'haar');
  • 60. 60 sizex = size(X) sizec = size(c) val_s = s % Extract details coefficients at level 6 % in each orientation, from wavelet decomposition % structure [c,s]. D = detcoef2('c',c,s,6) Code for clustering data at Level 6 close all clear all clc path(path,'F:Traineeurbanroidatahaar'); X=textread('totaln6.txt'); for i = 1:5 mi = min(X(:,i)); ma = max(X(:,i)); X(:,i) = (X(:,i)-mi)/(ma - mi); end msize = [4 4]; lattice = 'hexa'; % hexagonal lattice neigh = 'gaussian'; % neighborhood function radius_coarse = [4 .5]; % [initial final] trainlen_coarse = 50; % cycles in coarse training radius_fine = [.5 .5]; % [initial final] trainlen_fine = 10; % cycles in fine training smI = som_lininit(X, 'msize', msize, 'lattice', lattice, 'shape', ... 'sheet'); smC = som_batchtrain(smI, X, 'radius', radius_coarse, 'trainlen', ... trainlen_coarse, 'neigh', neigh); som_cplane('rect',msize, 'none') sm = som_batchtrain(smC, X, 'radius', radius_fine, 'trainlen', ... trainlen_fine, 'neigh', neigh); labels1 = 'CCCCC'; labels2 = '12345'; M = sm.codebook; norms2 = sum(M.*M,2); op_mat=zeros(5,2); t=1; % input serial no. for u=1:36 X1 = X(u,:)'; Y = norms2 - 2*M*X1; [C,c] = min(Y); op_mat(u,1)=t; op_mat(u,2)=c; ch = mod(c-1,4) + 1; cv = floor((c-1)/4) + 1; if mod(cv,2) == 1
  • 61. 61 shift1 = -.15; else shift1 = .35; end if u==9 || u==11 shift2 = -.3; elseif u==5 || u==14 shift2 = .3; else shift2 = 0; end % text(ch+shift1,cv+shift2,[labels1(u) ... % labels2(u)], 'FontSize',15); text(ch,cv+shift2,[labels1(u) ... labels2(u)], 'FontSize',15); t=t+1; end op_mat Fig 5.21. Clustering at Level 6 Conclusion: In the above figure, all are different classes (co and cross) derived from the wavelet coefficients. Now we can see that C3 and C5 are near to each other hence having same properties.
  • 62. 62 6. Additional Feature of SOM A graphic display based on these distances, called the U matrix, is explained in this section. The clustering tendency of the data, or the models to describe them, can be shown graphically based on the magnitudes of vectorial distances between neighboring model s in the map, as shown in Fig. 6.1. below. In it, the number of hexagonal cells has first been increased from 18 by 17 to 35 by 33, in order to create blank interstitial cells that can be colored by shades of gray or by pseudocolors to emphasize the cluster borders. This creation of the interstitial cells is made automatically, when the instructions defined below are executed. The U matrix. A graphic display called theU matrix has been developed by Ultsch [87], as well as Kraaijveld et al. [47], to illustrate the degree of clustering tendency on the SOM. In the basic method, interstitial (e.g. hexagonal in the present example) cells are added between the original SOM cells in the display. So, if we have an SOM array of 18 by 17 cells, after addition of the new cells the array size becomes 35 by 33. Notice, however, that the extra cells are not involved in the SOM algorithm, only the original 18 by 17 cells were trained. The average (smoothed) distances between the nearest SOM models are represented by light colors for small mean differences, and darker colors for larger mean differences. A ”cluster landscape,” formed over the SOM, then visualizes the degree of classification. The groundwork for the U matrix is generated by the instructions colormapigray = ones(64,3) - colormap(’gray’); colormap(colormapigray); msize = [18 17]; Um = som_umat(sm); som_cplane(’hexaU’, sm.topol.msize, Um(:)); In this case, we need not draw the blank SOM groundwork by the instruction som cplane(’hexa’,msize, ’none’) . The example with the countries in August 2014 will now be represented together with the U-matrix ”landscape” in Fig. 6.1.. It shows darker “ravines” between the clusters. Annotation of the SOM nodes. After that, the country acronyms are written on it by the text instruction. The complete map is shown in Fig. 6.1.
  • 63. 63 Fig. 6.1. The SOM, with the U matrix, of the financial status of 50 countries A particular remark is due here. Notice that there are plenty of unlabelled areas in the SOM. During training, when the model vectors had not yet reached their final values, the ”winner nodes” for certain countries were located in completely different places than finally; nonetheless the models at these nodes gathered memory traces during training, too. Thus, these nodes have learned more or less wrong and even random values during the coarse training, with the result that the vectorial differences of the models in those places are large. So the SOM with unique items mapped on it and having plenty of blank space between them is not particularly suitable for the demonstration of the U matrix, although some interesting details can be found in it.
  • 64. 64 7. Future Enhancements Artificial neural Networks is currently a hot research area in Data compression. GSOM algorithm for data compression being a wide field which is rapidly finding use in many applied fields and technologies GSOM has some limitation. They cannot compress higher size of audio and video file. So To Improve the Compression Ratio of higher size of audio and video file in future enhancement. 8. Bibliography and References 1. IJCSNS International Journal of Computer Science and Network Security, VOL.14 No.11, November 2014 2. https://en.wikipedia.org/wiki/Haar_wavelet 3. https://en.wikipedia.org/wiki/Wavelet 4. https://en.wikipedia.org/wiki/Cross-polarized_wave_generation 5. http://www.edmundoptics.com/resources/application-notes/imaging/what-is-swir/ 6. https://en.wikipedia.org/wiki/Near-infrared_spectroscopy 7. https://en.wikipedia.org/wiki/Additive_white_Gaussian_noise 8. https://en.wikipedia.org/wiki/Visible_spectrum 9. https://en.wikipedia.org/wiki/Visible_spectrum#/media/File:Linear_visible_spectrum. svg 10. Kohonen, T., MATLAB Implementations and Applications of the Self-Organizing Map, Unigrafia Oy, Helsinki, Finland, 2014. 11. Introduction to Kohanen’s Self Organising Maps, Fernando Bacao and Victor Lobo