Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
KCS-055 MLT U4.pdf
1. UNIT-4: Artificial Neural Network and Deep Learning
Dr. Radhey Shyam
Professor
Department of Computer Science and Engineering
BIET Lucknow
Following slides have been prepared by Dr. Radhey Shyam, with grateful acknowledgement of others who
made their course contents freely available. Feel free to reuse these slides for your own academic purposes.
1
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18. 1252 MIPRO 2017/CTS
Brief Review of Self-Organizing Maps
Dubravko Miljković
Hrvatska elektroprivreda, Zagreb, Croatia
dubravko.miljkovic@hep.hr
Abstract - As a particular type of artificial neural networks,
self-organizing maps (SOMs) are trained using an
unsupervised, competitive learning to produce a low-
dimensional, discretized representation of the input space of
the training samples, called a feature map. Such a map
retains principle features of the input data. Self-organizing
maps are known for its clustering, visualization and
classification capabilities. In this brief review paper basic
tenets, including motivation, architecture, math description
and applications are reviewed.
I. INTRODUCTION
Among numerous neural network architectures,
particularly interesting architecture was introduced by
Finish Professor Teuvo Kohonen in the 1980s, [1,2]. Self-
organizing map (SOM), sometimes also called a Kohonen
map use unsupervised, competitive learning to produce
low dimensional, discretized representation of presented
high dimensional data, while simultaneously preserving
similarity relations between the presented data items. Such
low dimensional representation is called a feature map,
hence map in the name. This brief review paper attempts
to introduce a reader to SOMs, covering in short basic
tenets, underlying biological motivation, its architecture,
math description and various applications, [3-10].
II. NEURAL NETWORKS
Human and animal brains are highly complex,
nonlinear and parallel systems, consisting of billions of
neurons integrated into numerous neural networks, [3]. A
neural networks within a brain are massively parallel
distributed processing system suitable for storing
knowledge in forms of past experiences and making it
available for future use. They are particularly suitable for
the class of problems where it is difficult to propose an
analytical solution convenient for algorithmic
implementation.
A. Biological Motivation
After millions of years of evolution, brain in animals
and humans has evolved into the massive parallel stack of
computing power capable of dealing with the tremendous
varieties of situations it can encounter. The biological
neural networks are natural intelligent information
processors. Artificial neural networks (ANN) constitute
computing paradigm motivated by the neural structure of
biological systems, [6]. ANNs employ a computational
approach based on a large collection of artificial neurons
that are much simplified representation of biological
neurons. Synapses that ensure communication among
biological neurons are replaced with neuron input weights.
Adjustment of connection weights is performed by some
of numerous learning algorithms. ANNs have very simple
principles, but their behavior can be very complex. They
have a capability to learn, generalize, associate data and
are fault tolerant. The history of the ANNs begins in the
1940s, but the first significant step came in 1957 with the
introduction of Rosenblatt’s perceptron. The evolution of
the most popular ANN paradigms is shown in Fig. 1, [10].
B. Basic Architectures
An artificial neural network is an interconnected
assembly of simple processing elements, called artificial
neurons (also called units or nodes), whose functionality
mimics that of a biological neuron, [4]. Individual neurons
can be combined into layers, and there are single and
multi-layer networks, with or without feedback. The most
common types of ANNs are shown in Fig. 2, [11]. Among
training algorithms the most popular is backpropagation
and its variants. ANNs can be used for solving a wide
variety of problems. Before the use they have to be
trained. During the training, network adjusts its weights. In
supervised training, input/output pairs are presented to the
network by an external teacher and network tries to learn
desired input output mapping. Some neural architectures
(like SOM) can learn without supervision (unsupervised)
from the training data without specified input/output pairs.
Figure 1. Evolution of artificial neural network paradigms, based on [10]
Figure 2. Most common artificial neural networks, according to [11]
19. MIPRO 2017/CTS 1253
III. SELF-ORGANIZING MAPS
Self-organized map (SOM), as a particular neural
network paradigm has found its inspiration in self-
organizing and biological systems.
A. Self-Organized Systems
Self-organizing systems are types of systems that can
change their internal structure and function in response to
external circumstances and stimuli, [12-15]. Elements of
such a system can influence or organize other elements
within the same system, resulting in a greater stability of
structure or function of the whole against external
fluctuations, [12]. The main aspects of self-organizing
systems are increase of complexity, emergence of new
phenomena (the whole is more than the sum of its parts)
and internal regulation by positive and negative feedback
loops. In 1952 Turing published a paper regarding the
mathematical theory of pattern formation in biology, and
found that global order in a system can arise from local
interactions, [13]. This often produces a system with new,
emergent properties that differ qualitatively from those of
components without interactions, [16]. Self-organizing
systems exist in nature, including non-living as well as
living world, they exist in man-made systems, but also in
the world of abstract ideas, [12].
B. Self-Organizing Map
Neural networks of neurons with lateral
communication of neurons topologically organized as
self-organizing maps are common in neurobiology.
Various neural functions are mapped onto identifiable
regions of the brain, Fig. 3, [17]. In such topographic
maps neighborhood relation is preserved. Brain mostly
does not have desired input-output pairs available and has
to learn in unsupervised mode.
Figure 3. Maps in brain, [17]
A SOM is a single layer neural network with units set
along an n-dimensional grid. Most applications use two-
dimensional and rectangular grid, although many
applications also use hexagonal grids, and some one,
three, or more dimensional spaces. SOMs produce low-
dimensional projection images of high-dimensional data
distributions, in which the similarity relations between
the data items are preserved, [18],
C. Principles of Self-Organization in SOMs
Following three processes are common to self-
organization in SOMs, [7,19,20]:
1. Competitive Process
For each input pattern vector presented to the map, all
neurons calculate values of a discriminant function. The
neuron that is most similar to the input pattern vector is
the winner (best matching unit, BMU).
2. Cooperative Process
The winner neuron (BMU) finds the spatial location
of a topological neighborhood of excited neurons.
Neurons from this neighborhood may then cooperate.
3. Synaptic Adaptation
Provides that excited neurons can modify their
values of the discriminant function related to the presented
input pattern vector by the process of weight adjustments.
D. Common Topologies
SOM topologies can be in one, two (most common)
or even three dimensions, [2-10]. Two most used two
dimensional grids in SOMs are rectangular and hexagonal
grid. Three dimensional topologies can be in form of a
cylinder or toroid shapes. 1-D (linear) and 2-D grids are
illustrated in Fig. 4, with corresponding SOMs in Fig. 5
and Fig. 6, according to [19].
Figure 4. Most common grids and neuron neighborhoods
Figure 5. 1-D SOM network, according to [19].
Figure 6. 2-D SOM network, according to [19].
IV. LEARNING ALGORITHM
In 1982 Professor Kohonen presented his SOM
algorithm, [1]. Further advancement in a field came with
the Second edition of his book “Self-Organization and
Associative Memory” in 1988, [2].
A. Measures of Distance and Similarity
To determined similarity between the input vector and
neurons measures of distance are used. Some popular
distances among input pattern and SOM units are, [21]:
Euclidian
Correlation
Direction cosine
Block distance
In a real application most often squared Euclidean
distance is used, (1):
i
ij
i w
x
dj
2 (1)
20. 1254 MIPRO 2017/CTS
B. Neighborhood Functions
Neurons within a grid interact among themselves using
a neighbor function. Neighborhood functions most often
assume the form of the Mexican hat, (2), Fig. 7, that has
biological motivation behind (rejects some neurons in the
vicinity to the winning neuron) although other functions
(Gaussian, cone and cylinder) are also possible, [22].
Ordering algorithm is robust to the choice of function
type if the neighbor radius and learning rate decrease to
zero. The popular choice is the exponential decay.
2
2
1
2
2
,
,
r
m
j
n
i
mn
ij
g e
r
w
w
h (2)
Figure 7. Mexican hat function
C. Initialization of Self-Organizing Maps
Before training SOM, units (i.e. its weights) should be
initialized. Common approaches are, [2,23]:
1. Use of random values, completely independent of the
training data set
2. Use of random samples from the input training data
3. Initialization that tries to reflect the distribution of
the data (Principal Components)
D. Training
Self-organizing maps use the most popular algorithm
of the unsupervised learning category, [2]. The criterion
D, that is minimized, is the sum of distances between all
input vectors xn and their respective winning neuron
weights wi calculated at the end of each epoch,(3),[21]:
k
i c
n
i
n
i
D
1
2
w
x
(3)
Training of self-organizing maps, [2,18], can be
accomplished in two ways: as sequential or batch training.
1. Sequential training
single vector at a time is presented to the map
adjustment of neuron weights is made after a
presentation of each vector
suitable for on-line learning
2. Batch training
whole dataset is before any adjustment to the
neuron weights is made
suitable for off-line learning
Here are steps for the sequential training, [3,7,19,22]:
1. Initialization
Initialize the neuron weights (iteration step n=0)
2. Sampling
Randomlysample avector x n from the dataset
3. Similarity Matching
Find the best matching unit (BMU), c, with
weights wbmu=wc, (4):
n
n
c i
i
w
x
min
arg
(4)
4. Updating
Update each unit i with the following rule:
n
n
n
r
n
n
h
n
n
n i
i
bmu
i
i w
x
w
w
w
w
,
,
1 (5)
5. Continuation
Increment n. Repeat steps 2-4 until a stopping
criterion is met (e.g. the fixed number of
iterations or the map has reached a stable state).
For the convergence and stability to be guaranteed, the
learning rate α n and neighborhood radius r n are
decreasing with each iteration towards the zero, [22].
SOM Sample Hits, Fig. 8, show the number of input
vectors that each unit in SOM classifies, [24].
Figure 8. SOM Sample Hits, [24]
During the training process two phases may be
distinguished, [7,18]:
1. Self-organizing (ordering) phase:
Topological ordering in the map takes place (roughly
first 1000 iterations). The learning rate α n and
neighborhood radiusr n are decreasing.
2. Convergence (fine tuning) phase:
This is fine tuning that provides an accurate statistical
representation of the input space. It typically lasts at
least (500 xnumber of neuron) iterations. The smaller
learning rate α n and neighborhood radius r n may
be kept fixed (e.g. last values from the previous phase).
After the training of the SOM is completed, neurons
may be labeled if labeled pattern vectors are available.
E. Classification
Find the best matching unit (BMU), c, (5):
i
i
c w
x
min
arg
(5)
Test pattern x belongs to the class represented by the best
matching unit c.
V. PROPERTIES OF SOM
After the convergence of SOM algorithm, resulting
feature map displays important statistical characteristics
of the input space. They are also able to discover relevant
patterns or features present in the input data.
A. Important Properties of SOMs
SOMs have four important properties, [3,7]:
1. Approximation of the Input Space
The resulting mapping provides a good approximation
to the input space. SOM also performs dimensionality
reduction bymapping multidimensionaldataon SOM grid.
2. Topological Ordering
Spatial locations of the neurons in the SOM lattice are
topologically related to the features of the input space.
3. Density Matching
The density of the output neurons in the map
approximates the statistical distribution of the input
space. Regions of the input space that contain more
training vectors are represented with more output neurons.
21. MIPRO 2017/CTS 1255
4. Feature Selection
Map extracts the principal features of the input space. It
is capable of selecting the best features for approximation
of the underlying statistical distribution of the input space.
B. Representing the Input Space with SOMs of Various
Topologies
1. 1-D
2D input data points are uniformly distributed in a
triangle, 1D SOM ordering process shown in Fig. 9, [2].
Figure 9. 2D to 1D mapping by a SOM (ordering process), [2]
2. 2-D
2D input data points are uniformly distributed in a
square, 2D SOM ordering process shown in Fig. 10, [3].
Figure 10. 2D to 2D mapping by a SOM (ordering process), [3]
3. Torus SOMs
In conventional SOM, the size of neighborhood set is
not always constant because the map has its edges. This
problem can be mitigated by use of torus SOM that has
no edges, [25]. However torus SOM, Fig. 11, is not easy
to visualize as there are now missing edges.
Figure 11. Torus SOM
4. Hierarchical SOMs
After previous topologies, hierarchical SOMs should
also be mentioned. Hierarchical neural networks are
composed of multiple loosely connected neural networks
that form an acyclic graph. The outputs of the lower level
SOMs can be used as the input for the higher level SOM,
Fig. 12, [10]. Such input can be formed of several vectors
from Best Matching Units (BMUs) of many SOMs.
Figure 12. Hierarchical SOM, [10]
VI. APPLICATIONS
Despite its simplicity, SOMs can be used for a various
classes of applications, [2,26,27]. This in a broad sense
includes visualizations, generation of feature maps,
pattern recognition and classification. Kohonen in [2]
came with the following categories of applications:
machine vision and image analysis, optical character
recognition and script reading, speech analysis and
recognition, acoustic and musical studies, signal
processing and radar measurements, telecommunications,
industrial and other real world measurements, process
control, robotics, chemistry, physics, design of electronic
circuits, medical applications without image processing,
data processing linguistic and AI problems, mathematical
problems and neurophysiological research. With such an
exhaustive list provided here, as space permits, it is
possible only to mention some of them that are interesting
and popular.
A. Speech Recognition
The neural phonetic typewriter for Finnish and
Japanese speech was developed by Kohonen in 1988,
[28]. The signal from the microphone proceeds to acoustic
preprocessing, shown in more detail in Fig. 13, forming
15-component pattern vector (values in 15 frequency
beans taken every 10 ms), containing a short time spectral
description of speech. These vectors are presented to a
SOM with the hexagonal lattice of the size 8 x 12.
Figure 13. Acoustic preprocessing
After training resulting phonotopic map is shown in
Fig. 14, [7]. During speech recognition new pattern
vectors are assigned category belonging to a closest
prototype in the map.
Figure 14. Phonotopic map, [7]
B. Text Clustering
Text clustering is the technology of processing a large
number of texts that gives their partition. Preparation of
text for SOM analysis is shown in Fig. 15, [29], and
Figure 15. Preparation of text for SOM analysis, according to [29]
22. 1256 MIPRO 2017/CTS
Figure 16. Framework for text clustering, [29]
complete framework in Fig. 16, [29]. Massive document
collections can be organized using a SOM. It can be
optimized to map large document collections while
preserving much of the classification accuracy Clustering
of scientific articles is illustrated in Fig. 17, [30].
Figure 17. Clustering of scientific articles, [30]
C. Application in Chemistry
SOMs have found applications in chemistry.
Illustration of the output layer of the SOM model using a
hexagonal grid for the combinatorial design of
cannabinoid compounds is shown in Fig. 18, [11].
Figure 18. Application of SOM in chemistry, [11]
D. Medical Imaging and Analysis
Recognition of diseases from medical images (ECG,
CAT scans, ultrasonic scans, etc.) can be performed by
SOMs, [21].This includes image segmentation, Fig. 19,
[31], to discover region of interest and help diagnostics.
Figure 19. Segmentation of hip image using SOM, [31]
E. Maritime Applications
SOMs have been widely for maritime applications,
[22]. One example is analysis of passive sonar recordings.
Also SOMs have been used for planning ship trajectories.
F. Robotics
Some applications of SOMs are control of robot arm,
learning the motion map and solving traveling salesman
problem (multi-goal path planning problem), Fig.20, [32].
Figure 20. Traveling Salesman Problem, [32]
G. Classification of Satellite Images
SOMs can be used for interpreting satellite imagery
like land cover classification. Dust sources can also be
spotted in images using theSOMas shown in Fig. 21, [33].
Figure 21. Detecting dust sources using SOMs, [33]
H. Psycholinguistic Studies
One example is the categorization of words by their
local context in three word sentences of the type subject-
predicate-object or subject-predicate-predicative that
were constructed artificially. The words become clustered
by SOM according to their linguistic roles in an orderly
fashion, Fig. 22, [18].
Figure 22. SOM in psycholinguistic studies, [18]
I. Exploring Music Collections
Similarity of music recordings may be determined by
analyzing the lyrics, instrumentation, melody, rhythm,
artists, or emotions they invoke, Fig. 23, [34].
Figure 23. Exploring music collections, [34]
23. MIPRO 2017/CTS 1257
J. Business Applications
Customer segmentation of the international tourist
market is illustrated in Fig. 24, [35]. Another example is
classifying world poverty (welfare map), [36]. Ordering
of items with the respect to 39 features describing various
quality-of-life factors, such as state of health, nutrition, and
educational services is shown in Fig. 25. Countries with
similar quality of life factors clustered together on a map.
Figure 24. Customer segmentation of the international tourist market,[35]
Figure 25. Poverty map based on 39 indicators from World Bank
statistics (1992), [36]
VII. CONCLUSION
Self-organizing maps (SOMs) are neural network
architecture inspired by the biological structure of human
and animal brains. They become one of the most popular
neural network architecture. SOMs learn without external
teacher, i.e. employ unsupervised learning. Topologically
SOMs most often use a two-dimensional grid, although
one-dimensional, higher-dimensional and irregular grids
are also possible. SOM maps higher dimensional input
onto the lower dimensional grid while preserving
topological ordering present in the input space. During
competitive learning SOM uses lateral interactions among
the neurons to form a semantic map where similar
patterns are mapped closer together than dissimilar ones.
They can be used for broad type of applications like
visualizations, generation of feature maps, pattern
recognition and classification. Humans can’t visualize
high-dimensional data, hence SOMs by mapping such
data to a two-dimensional grid are widely used for data
visualization. SOMs are also suitable for generation of
feature maps. Because they can detect clusters of similar
patterns without supervision SOMs are a powerful tool
for identification and classification of spatio-temporal
patterns. SOMs can be used as an analytical tool, but also
in a myriad of real world applications including science,
medicine, satelliteimaging and industry.
REFERENCES
[1] T. Kohonen, “Self-organized formation of topologically correct
feature maps”, Biol. Cybern. 43, pp. 59-69, 1982
[2] T. Kohonen, Self-Organizing Maps, 2nd ed., Springer 1997
[3] S. Haykin, Neural Networks: A Comprehensive Foundation, 2nd
ed., Prentice Hall PTR Upper Saddle River, NJ, USA, 1998
[4] K. Gurney, An introduction to neural network, UCL Press
Limited, London, UK, 1997
[5] D. Kriese, A Brief Introduction to Neural Networks,
http://www.dkriesel.com
[6] R. Rojas: Neural Networks, A Systematic Introduction, Springer-
Verlag, Berlin, 1996
[7] J. A. Bullinaria, Introduction to Neural Networks - Course
Material and Useful Links, http://www.cs.bham.ac.uk/~jxb/NN/
[8] C. M. Bishop, Neural Networks for Pattern Recognition,
Clarendon Press, Oxford, 1997
[9] R. Eckmiller, C. Malsburg, Neural Computers, NATO ASI Series,
Computer and Systems Sciences, 1988
[10] P. Hodju and J. Halme, Neural Networks Information Homepage,
http://phodju.mbnet.fi/nenet/SiteMap/SiteMap.html
[11] Káthia Maria Honório and A. B. F. da Silva, “Applications of
artificial neural networks in chemical problems”, in Artificial
Neural Networks - Architectures and Applications, InTech, 2013
[12] W. Banzhafl, “Self-organizing systems”, in Encyclopedia of
Complexity and Systems Science, 2009, Springer, Heidelberg,
[13] A. M. Turing, “The chemical basis of morphogenesis”,
Philosophical Transactions of the Royal Society of London. Series
B, Biological Sciences, Vol. 237, No.641. pp.37-72, Aug. 14, 1952
[14] W. R. Ashby, “Principles of the self-organizing system”, E:CO
Special Double Issue Vol. 6, No. 1-2, pp. 102-126, 2004
[15] C. Fuchs, “Self-organizing system”, in Encyclopedia of
Governance, Vol. 2, SAGE Publications, 2006, pp. 863-864
[16] J. Howard, “Self-organisation in biology”, in Research
Perspectives 2010+ of the Max Planck Society, 2010, pp. 28-29
[17] The Wizard of Ads Brain Map - Wernicke and Broca,
https://www.wizardofads.com.au/brain-map-brocas-area/
[18] T. Kohonen, MATLAB Implementations and Applications of the
Self-Organizing Map, Unigrafia, Helsinki, Finland, 2014
[19] Bill Wilson, Self-organisation Notes, 2010,
www.cse.unsw.edu.au/~billw/cs9444/selforganising-10-4up.pdf
[20] J. Boedecker, Self-Organizing Map (SOM), .ppt, Machine
Learning, Summer 2015, Machine Learning Lab, Univ. of Freiburg
[21] L. Grajciarova, J. Mares, P. Dvorak and A. Prochazka, Biomedical
image analysis using self-organizing maps, Matlab Conference 2012
[22] V. J. A. S. Lobo, “Application of Self-Organizing Maps to the
Maritime Environment”, Proc. IF&GIS 2009, 20 May 2009, St.
Petersburg, Russia, pp. 19-36
[23] A. A. Akinduko and E. M. Mirkes, “Initialization of self-organizing
maps: principal components versus random initialization. A case
study”,InformationSciences,Vol.364,Is.C,pp. 213-221, Oct.2016
[24] MathWorks, Self-Organizing Maps,
https://www.mathworks.com/help/nnet/ug/cluster-with-self-
organizing-map-neural-network.html
[25] M. Ito, T. Miyoshi, and H. Masuyama, “The characteristics of the
torus self organizing map”, Proc. 6th Int. Conf. on Soft Computing
(IIZUKA’2000), Iizuka, Fukuoka, Japan, Oct. 1-4,2000,pp. 239-44
[26] M. Johnsson ed., Applications of Self-Organizing Maps, InTech,
November 21, 2012
[27] J. I. Mwasiagi (ed.), Self Organizing Maps - Applications and
Novel Algorithm Design, InTech, 2011
[28] T. Kohonen, “The ‘neural’ phonetic typewriter”, IEEE Computer
21(3), pp. 11–22, 1988
[29] Yuan-Chao Liu, Ming Liu and Xiao-Long Wang, “Application of
self-organizingmapsintextclustering:areview”,in“SelfOrganizing
Maps - Applications and Novel Algorithm Design”, InTech, 2012
[30] K. W. Boyacka et. all., Supplementary information on data and
methods for “Clustering more than two million biomedical
publications: comparing the accuracies of nine text-based
similarity approaches”, PLoS ONE 6(3): e18029, 2011
[31] A. Aslantas, D. Emre and M. Çakiroğlu, “Comparison of
segmentation algorithms for detection of hotspots in bone
scintigraphy images and effects on CAD systems”, Biomedical
Research, 28 (2), pp. 676-683, 2017
[32] J. Faigl, “Multi-goal path planning for cooperative sensing”, PhD
Thesis, Czech Technical University of Prague, February 2010
[33] D. Lairy, Machine Learning for Scientific Applications, slides,
https://www.slideshare.net/davidlary/machine-learning-for-scientific-applications
[34] E. Pampalk, S. Dixon and G. Widmer, “Exploring music
collections by browsing different views”, Computer Musical
Journal, Vol. 28, No. 2, pp. 49-62, Summer 2004
[35] J. Z. Bloom, “Market segmentation-aneural network application”,
Annals of Tourism Research, Vol. 32, No. 1, pp. 93–111, 2005
[36] World Poverty Map, SOM research page, Univ. of Helsinki,
http://www.cis.hut.fi/research/som-research/worldmap.html
24. Convolutional Neural Network (CNN)
A Convolutional Neural Network (CNN) is a class of ANNs.
CNN was developed primarily triggered by the challenges of image
recognition.
CNN architectures are strongly influenced by our current neuro science models
of the organization of human and animal visual perception.
The central convolution mechanisms of CNN are inspired by receptive fields
and their direct connections to specific neuron structures.
The implementation of these mechanisms are based on the concept of
convolution function in mathematics.
CNNs use relatively little pre-processing compared to other image
classification algorithms. This means that the network learns the filters that in
traditional algorithms were hand-engineered. This independence from prior
knowledge and human effort in feature design is a major advantage.
25. Image Recognition
The classical problem in computer vision is that of determining whether or not the
image data contains some specific object, feature, or activity. Different varieties of the
recognition problem are:
Object recognition or object classification – one or several pre-specified or learned
objects or object classes can be recognized, usually together with their 2D positions in
the image or 3D poses in the scene.
Identification – an individual instance of an object is recognized. Examples include
identification of a specific person's face or fingerprint, identification of handwritten
digits or letters or identification of a specific object.
Detection – the image data are scanned for a specific condition. Examples include
detection of possible abnormal cells or tissues in medical images or detection of a
vehicle in an automatic road toll system. Detection based on relatively simple and fast
computations is sometimes used for finding smaller regions of interesting image data
which can be further analyzed by more computationally demanding techniques to
produce a correct interpretation.
26. ImageNet
The ImageNet project is a large visual database designed for use in visual
object recognition software research.
More than 14 million images have been hand-annotated by the project to
indicate what objects are pictured.
ImageNet contains more than 20,000 categories with a typical category
consisting of several hundred images.
Since 2010, the ImageNet project runs an annual software contest, the
ImageNet Large Scale Visual Recognition Challenge (ILSVRC), where
software programs compete to correctly classify and detect objects and
scenes. The challenge uses a specially selected list of one thousand non-
overlapping classes.
27. Image input in Compact Symbolic
standard pixel form characterization
of image as output
Alternative architecture
CNN architecture
ANN architecture
Manual Mapping
Automated Mapping
Image Recognition Systems
28. Input to Image Recognition systems - finite arrays of pixels
29. RGB Images
An RGB image, sometimes referred to as a true-color image, is a m-
by-n-by-3 data array RGB ( .. , .. , ..) that defines red, green, and blue
color components for each individual pixel. The color of each pixel is
determined by the combination of the red, green, and blue intensities
stored in each color plane at the pixel's location.
An RGB color component is a value between 0 and 1. A pixel whose
color components are (0,0,0) displays as black, and a pixel whose
color components are (1,1,1) displays as white.
The three color components for each pixel are stored along the third
dimension of the data array.
For example, the red, green, and blue color components of the pixel
(3,3,5) are stored in RGB(2,3,1), RGB(2,3,2), and RGB(3,3,3),
respectively. Suppose (2,3,1) contains the value 0.5176, (2,3,2)
contains 0.1608, and (2,3,3) contains 0.0627. The color for the pixel at
(2,3) is 0.5176 0.1608 0.0627
30. Output from an Image Recognition system
One or several object categories (classes) present in the image
Specific objects (instances) present in the image
Subset of features of object and/or categories observable in the image
Topological and Geometrical aspects of the image
Dynamic properties of elements in the image (requires sequences of images)
All the above elements can be represented in symbolic and numeric form
A Feature vector is still a default option.
32. Eye
Superior
colliculus
Dorsal
LGN V1 V2
V3
V4
V3A STS
TEO
V5
TE
Posterior
parietal Cx
Striate
Cortex
Extrastriate
Cortex
Inferior Temporal
Cortex
STS Superior temporal sulcus
TEO Inferior temporal cortex
TE Inferior temporal cortex
The Organization of the Visual Cortex
Dorsal stream
Ventral stream
V1
33. The connections between Receptive fields
and Neurons in the Visual Cortex
Nobel prize awarded work by Hubel and Wiesel in the 1950s and 1960s showed that cat and monkey
visual cortexes contain neurons that individually respond to small regions of the visual field.
Provided the eyes are not moving, the regions of visual space within which visual stimuli affect the firing of
single neurons we call receptive fields..
Neighboring neurons have similar and overlapping receptive fields.
Receptive fields sizes and locations varies systematically to form a complete map of visual space. The responses
of specific to a subset of stimuli within its receptive field is called neuronal tuning.
A1968 article by Hubel and Wieser identified two basic visual cell types in the brain:
• simple cells, whose output is maximized by straight edges having particular orientations within their
receptive field. Neurons of this kind are located in the earlier visual areas (like V19).
• complex cells, which have larger receptive fields, whose output is insensitive to the exact position of the
edges in the field. In the higher visual areas, neurons have complex tuning. For example, in the inferior
temporal cortex, a neuron may fire only when a certain face appears in its receptive field.
Hubel and Wiesel also proposed a cascading model of these two types of cells for use in pattern recognition task.
34. Convolution as defined in Mathematics
Convolution is a mathematical operation on two functions
(f and g) to produce a third function that expresses how
the shape of one is modified by the other.
• Express each function in terms of a dummy variable a.
• Reflect one of the functions: g(a) → g(-a)
• Add a time-offset, x, which allows g to slide along
the a-axis from −∞ to +∞.
• Wherever the two functions intersect, find the integral
of their product.
• In other words, compute a sliding, weighted-sum of
function f(a) where the weighting function is g(-a)
• The resulting waveform is the convolution of
functions f and g.
The term convolution refers to both the result function and
to the process of computing it. Convolution is similar to
cross-correlation and related to autocorrelation.
35. 1
1
1/2
-1 2
-2
1
1
1/2
-1 2
-2
1
1
1/2
-1 2
-2
1
1
1/2
-1 2
-2
¤
1
1/2
-1 2
-2 1
1
1/2
-1 2
-2 1
1
1/2
-1 2
-2
b
Example
Compute the convolution of f and g =f*g
f= g=
Reflect the weight
function g
Slide g
f*g = 0 f*g = 0
Result
x x
X-1
0 1
¤
37. Convolutional Neural Network related Terminology
Convolution
Filter (or synonymously Kernel)
Stride
Padding
Feature map
Parameter sharing
Local connectivity
Pooling Subsampling Downsampling
Subsampling ratio
Maxpooling Average pooling
ReLU
Softmax
38. The Feature Learning Phase
The feature learning phase in a CNN consists of an arbitrary number of
pairs of Convolution and Pooling layers.
The number and roles of these pairs of layers are engineering decisions for
particular problem settings but in general later (deeper) levels handles more
abstract or high level features or patterns in analogy with our assumed
model of the functioning of the human visual cortex.
40. Convolution for one Filter in a Convolution Layer
In our example we take a 5*5*3 Filter and Slide it over the
Input array with a Stride of 1.
Let us disregard the color dimensions for a moment.
In each step of the slide, take the dot product between
each filter element and each element of each subarea of
the Input array. For every dot product taken, the result is a
scalar.
There are 28*28 unique positions where the filter can be
put on the image and the therefore the total result is a
Feature Map = a 28x28x1 array.
If the Stride is larger than 1 the Feature Map becomes more
reduced.
41. Example of a filter and a single convolution operation
The input is a 7x7 array with 49 elements The filter is shown in the middle. The
filter size is 3x3 (black and white), the stride is 1. This is an example of a filter
that detects diagonal patterns (1 on the diagonals). The output is a 5x5 array with
25 elements.
We slide the array systematically across the input array (analogy with convulotion).
There are 25 distinct distinct sliding positions. For each position we calculated the
dot product elementwise and puts the result in the output matrix.
42. Padding
Depending of the size of input array, size of filters and
stride, the sliding process can miss to apply the filter to
some input array element.
A possibilty is to ´pad´ the original input array with a frame
and use he extended array as a basis for the convolution.
Wether this is benefical for the process or not depends on
the specific situation.
If padding is never used, the system of arrays are shrinking
rapidly but if padding is used sytematically the size of
arrays are kept up.
43. Repeated convolution for all
filters in a convolution layer
Each convolution layer comprises a set of independent
Filters.
Each filter is independently convolved with the input
image.
In the example there are 6 filters in this first convolution
layer.
Which generates 6 feature maps of shape 28*28*1.
44. Pooling (Subsampling) Layer
A Pooling layer is frequently used in a convolution neural
network with the purpose to progressively reduce the spatial
size of the representation to reduce the amount of features and
the computational complexity of the network.
Pooling layer operates on each Feature map independently.
The main reason for the pooling layer is to prevent the model
from overfitting.
The choice of filtersize, stride (and maybe padding) are also
relevant for the pooling phases.
The most common approach used in pooling is Max pooling.
As an example a MAXPOOL of 2 x 2 would cause a Filter of 2
by 2 to traverse over the entire matrix with a Stride of 2 and
pick the largest element from the window to be included in the
next representation map. Average pooling takes the average.
45. Two aspects of the neuron structures
in the convolution and pooling layers
Weight-sharing
Based on the motivation that a certain feature/filter should treat all subareas of the visual
space similarity the same weights should be employed within a convolution computation
phase. This
brings down the complexity of the networks
Local connectivity
In contrast to general ANN, the neuron connections in the input, convolution and pooling
layers are restricted, primarily motivated by the fact that specific neurons are allocated to only
small sub-areas of the total visual field. This brings down the complexity of the networks.
46. Flattening
When leaving the convolution and pooling
layers and before entering the fully connected
layers the output of the previous layers
is flattened.
By this is meant that the dimensions of the input
array from earlier phases are flattened out into
one large dimension.
For example a 3 D array with a shape off
(10x10x10) when flattened would become a 1 D
array with 1000 elements.
47. The Fully Connected Layers
If a Softmax activity function is used, each number in this N dimensional vector
represents the probability of a certain class.
For example, if the resulting vector for a digit classification program is [0, 0.1,
0.1, 0 .75, 0, 0, 0, 0, 0, 0.05], then this represents a 10% probability that the
image is a 1, a 10% probability that the image is a 2, a 75% probability that the
image is a 3, and a 5% probability that the image is a 9.
The fully connected layers takes as input, a
flattened array representing the activation maps
of high level features from earlier layers and
outputs an N dimensional vector.
N is the number of classes that the program has to
choose from. For example, if the task is digit
classification, N would be 10 since there are 10
digits.
The fully connected layers determine which
features best correlate to a particular class.
48. Activation functions used
ReLU (Rectified Linear Unit) and Leaking ReLU activation functions
The advantages are simplicity and efficiency.
Typically used in the convolution layers of CNN.
Sigmoid and hyberbolic tangent (Tanh) functions
Sigmoid and Tanh are typically used for fully connected networks aimed for binary
classification problems. Can be used for the output layers of a CNN
Softmax
Softmax is equivalent to Sigmoid for binary classification but is primarily aimed for
the multi class case where the non-normalized output of a network is mapped onto a
probability distribution over predicted output classes.
Typically used in the output layers of CNN.
Gaussian activation function
Can be used for the output layers of a CNN.
For regression problems, the final layer has typically an identity activation.
49. LeNet-5 – A Classic CNN Architecture
In 1990 Yann LeCun, Leon Bottou, Yosuha Bengio and Patrick Haffner proposed a neural network architecture for
handwritten and machine-printed character recognition which they called LeNet-5. The architecture is
straightforward and simple to understand and well suited as an Introduction to CNNs.
The LeNet-5 architecture consists of:
• two sets of convolutional and average pooling (subsampling) layers followed by
• one flattening convolutional layer, then
• two fully-connected layers and finally
• one Softmax classifier.
50. First Layer
The input for LeNet-5 is a 32×32 grayscale
image which passes through the first
convolutional layer with 6 feature maps or
filters having size 5×5 and a stride of one. The
image dimensions changes from 32x32x1 to
28x28x6.
Second Layer
Then the LeNet-5 applies average pooling layer
or sub-sampling layer with a filter size 2×2 and
a stride of two. The resulting image dimensions
will be reduced to 14x14x6.
51. Third Layer
Next, there is a second convolutional layer with 16 feature maps having size
5×5 and a stride of 1. In this layer, only 10 out of 16 feature maps are
connected to 6 feature maps of the previous layer as shown below. The main
reason is to break the symmetry in the network and keeps the number of
connections within reasonable bounds. That’s why the number of training
parameters in this layers are 1516 instead of 2400 and similarly, the number
of connections are 151600 instead of 240000
Fourth Layer
The fourth layer (S4) is again an average pooling layer with filter size 2×2
and a stride of 2. This layer is the same as the second layer (S2) except it has
16 feature maps so the output will be reduced to 5x5x16.
52. Fifth Layer
The fifth layer (C5) is a fully connected convolutional layer with
120 feature maps each of size 1×1. Each of the 120 units in C5 is
connected to all the 400 nodes (5x5x16) in the fourth layer S4.
Sixth Layer
The sixth layer is a fully connected layer (F6) with 84 units.
Output Layer
Finally, there is a fully connected Softmax output layer ŷ with 10
possible values corresponding to the digits from 0 to 9.
Le-Net-5 layers
54. Timeline for CNN
1980 The Neocognitron, introduced by Kunihiko Fukushima.
1987 Time delay neural networks (TDNN), introduced by Alex Waibel.
1989 A system to recognize hand-written ZIP Code numbers using convolutions based on
laboriously hand designed filters, introduced by Yann LeCun.
1989 LeNet-5, a pioneering 7-level convolutional network by Yann LeCun.
2006 The first GPU-implementation of a CNN was described in by K. Chellapilla.
2012 Alexnet, A GPU-based CNN by Alex Krizhevsky
2013 won the ImageNet Large Scale Visual Recognition Challenge.