The development of neural network based vision for an autonomous vehicle
1. THE DEVELOPMENT OF NEURAL NETWORK BASED
VISION FOR AN AUTONOMOUS VEHICLE
BY
AKINOLA Otitoaleke Gideon
(EEG/2006/034)
A DISSERTATION SUBMITTED TO THE
DEPARTMENT OF ELECTRONIC AND ELECTRICAL ENGINERING,
FACULTY OF TECHNOLOGY,
OBAFEMI AWOLOWO UNIVERSITY, ILE-IFE, NIGERIA
IN PARTIAL FULFILMENT OF THE REQUIREMENT FOR THE AWARD OF
BACHELOR OF SCIENCE (HONOURS) DEGREE IN THE ELECTRONIC AND
ELECTRICAL ENGINEERING.
JANUARY, 2012.
2. Department of Electronic and Electrical Engineering,
Faculty of Technology,
Obafemi Awolowo University,
Ile-Ife, Osun State.
24th January, 2012.
The Coordinator,
Final Year Project,
Department of Electronic and Electrical Engineering,
Obafemi Awolowo University,
Ile-Ife, Osun State.
Dear Sir,
LETTER OF TRANSMITTAL
Kindly accept the accompanying copy of dissertation of my final year project
titled “The Development of Neural Network Based Vision for an Autonomous Vehicle”.
This was undertaken in partial fulfilment of the requirements for the award of a Bachelor
of Science (B.Sc.) degree from the Department of Electronic and Electrical Engineering,
Obafemi Awolowo University.
Thank You.
Yours Sincerely,
AKINOLA Otitoaleke Gideon
EEG/2006/034
ii
3. CERTIFICATION
This is to certify that this report was prepared by AKINOLA Otitoaleke Gideon
(EEG/2006/034) in accordance with the requirements stipulated for the execution of a
final year project as a fulfillment of the requirements for an award of Bachelor of Science
(B.Sc.) degree in Electronic and Electrical Engineering.
___________________
Dr. K.P. Ayodele
(Project Supervisor)
iii
4. ACKNOWLEDGEMENT
I give my heartfelt thanks to my supervisor, Dr. Kayode AYODELE for providing
the adequate direction and materials needed for this project. I also thank my family
members for their support and understanding.
Above all, I am indeed grateful to God Almighty for the progress made so far. It
could only have been possible by His grace and help.
iv
5. TABLE OF CONTENTS
LETTER OF TRANSMITTAL ....................................................................................................... ii
CERTIFICATION .......................................................................................................................... iii
ACKNOWLEDGEMENT .............................................................................................................. iv
ABSTRACT .................................................................................................................................. viii
LIST OF FIGURES ..........................................................................................................................x
LIST OF TABLES .......................................................................................................................... xi
CHAPTER ONE ...............................................................................................................................1
INTRODUCTION ............................................................................................................................1
1.1. Background .......................................................................................................................1
1.2. Project Description............................................................................................................4
1.3. Objective of the Project.....................................................................................................5
1.4. Scope .................................................................................................................................5
1.5. Justification .......................................................................................................................5
CHAPTER TWO ..............................................................................................................................7
LITERATURE REVIEW .................................................................................................................7
2.1. Mobile Robots ...................................................................................................................7
2.1.1. Types of Mobile Robots ............................................................................................7
2.1.2. Systems and Methods for Mobile Robot Navigation ................................................9
2.1.3. Mobile Robots and Artificial Intelligence...............................................................11
2.2. Autonomous Vehicles .....................................................................................................14
2.2.1. Road Recognition and Following............................................................................14
2.3. Vision and Image Processing ..........................................................................................18
2.3.1. Pixels: ......................................................................................................................19
2.3.2. Indexed Images .......................................................................................................19
2.3.3. Methods in Image Processing .................................................................................20
2.4. Artificial Neural Networks..............................................................................................22
2.4.1. Introduction .............................................................................................................22
2.4.2. Neural Network Architecture ..................................................................................25
2.4.3. Transfer Functions ..................................................................................................27
2.4.4. Learning rules .........................................................................................................28
2.4.5. Some important neural network paradigms ............................................................29
2.4.6. Validation ................................................................................................................34
2.4.7. Real life applications ...............................................................................................35
2.4.8. Case Study of Two Neural Network Paradigms .....................................................36
v
6. CHAPTER THREE ........................................................................................................................42
METHODOLOGY .........................................................................................................................42
3.1. Chassis Fabrication .........................................................................................................42
3.2. Vision, Image Acquisition and Processing .....................................................................44
3.3. Developing the Neural Network with Multi-Layer Perceptron (Method 1) ...................47
3.4. Developing the Neural Network with Self Organizing Maps (Preferred Method) .........49
3.5. Scene Identification.........................................................................................................50
3.6. Locomotion .....................................................................................................................50
3.7. Steering ...........................................................................................................................51
CHAPTER FOUR ...........................................................................................................................52
RESULTS AND DISCUSSIONS ...................................................................................................52
5.1. Simulations and Results ..................................................................................................52
5.1.1. Multi-Layer Perceptron Neural Network ................................................................52
5.1.2. Self-Organizing Maps (SOM) Neural Network ......................................................53
5.2. Observations ...................................................................................................................57
5.2.1. Front wheel actuation. .................................................................................................57
5.2.2. Processing requirement. ..............................................................................................57
5.2.3. Self-Organizing Map: A better Neural Network paradigm. ........................................57
CHAPTER FIVE ............................................................................................................................58
CONCLUSIONS AND RECOMMENDATIONS .........................................................................58
5.1. Conclusion ......................................................................................................................58
5.2. Recommendations ...........................................................................................................58
5.3. Further Works To Be Done .............................................................................................59
5.3.1. Advanced Image Processing ...................................................................................59
5.3.2. Path Planning ..........................................................................................................60
REFERENCES ...............................................................................................................................61
Appendix I: Matlab Codes for Image Acquisition and Processing .................................................66
Appendix II: Matlab Codes for Multilayer Perceptron Neural Network ........................................67
Appendix III: Matlab Codes for Self Organizing Map (SOM) Neural Network ............................70
Appendix IV: Matlab Codes Defining Functions for Various Movement Patterns ........................72
Appendix IV: Pictures Acquired For Training the Self Organizing Map (SOM) Neural Network 81
Appendix V: Self Organizing Map (SOM) Training Tool Showing 100 Iterations .......................83
Appendix VI: The Self Organizing Map (SOM) Weight Distances ...............................................84
Appendix VII: Plot Showing the SOM Sample Hits or Classes ....................................................85
Appendix VIII: Plot Showing Weight Positions of the Three Neurons ..........................................86
Appendix IX: Test Scene Images for the Neural Network .............................................................87
vi
7. Appendix X: GUI showing the state of the Neural Network ..........................................................88
Appendix XI: Confusion Matrices for the Neural Network............................................................89
Appendix XII: Error and Performance Plots for the Neural Network ............................................90
Appendix XIII: Gradient Plot for each epoch of the Neural Network ............................................91
Appendix XIV: Simulation Images for the Neural Network ..........................................................92
vii
8. ABSTRACT
This is a report on the development of neural network based vision for an
autonomous vehicle fabricated for my final year project. An Autonomous Vehicle is a
kind of robot that is self-acting and self-regulating and thereby reacts to its environment
without external control. The objective of the project is to implement and train an
Artificial Neural Network (ANN) to control a mobile robot based on its visual perception.
Discussions were carried out on: the types of mobile robots; systems and methods
for mobile robot navigation; and artificial intelligence methods used in mobile robot
navigation. Image processing and robot vision were also carefully considered. Special
attention was given to ANN which is an adaptive interconnected group of artificial
neurons that uses a mathematical or computational model for information processing
based on a connectionist approach to computation. The goal of the ANN is to discover
some associations between input and output patterns, or to analyse or find the structure of
the input patterns through iterative adjustment of weights of individual connection
between units.
To carry out the set objective, a model chassis for the autonomous vehicle was
fabricated. Differential Drive method was used to control steering and for the control of
the wheels, servomotors (Dynamixel) were attached to the front wheels only. Also, since
autonomy is based on visual perception, a webcam is the primary sensor. This was then
integrated into Matlab for acquiring images from scenes and getting the images processed
for effective use in the Artificial Neural Network. Initially, Multi-Layer Perceptron
(MLP) was used as the Neural Network, but due to observed classification errors, the use
of Self-Organizing Maps (SOM) was then adopted. Ultimately, the outputs of the Neural
Network were used to control the car movement and steering patterns. Analysis of
results, for scene identification, obtained from the Self-Organizing Maps (SOM) and the
viii
9. Multi-Layer Perceptron (MLP) neural networks are carried out. These results which are
outputs from the Neural Network were sent as control commands to take a particular
movement pattern.
Numerical outputs from the NN were obtained. The analysis of these outputs was
obtained in graphical form showing performance and classification plots. Comparison in
the analysis of results showed that SOM provides better suited Neural Network paradigm
for scene identification than the MLP. As a key result in realizing the objective of this
project we obtained that the robot was able to differentiate between three different scenes
and act on them.
Although the robot can differentiate between three or more different scenes and
act on them, to achieve a satisfactory degree of autonomy, image and video processing
need to be improved upon for road identification and path planning purposes. Also the
robot needs to be given better cognitive ability by fine-tuning the neural network. Various
recommendations were given in anticipation to further works. The project should serve as
a model in a Control Engineering course when fully developed.
ix
10. LIST OF FIGURES
FIGURE 2. 1: THE BASIC NEURAL UNIT PROCESSES THE INPUT INFORMATION INTO THE
OUTPUT INFORMATION. ................................................................................................ 26
FIGURE 2. 2: DIFFERENT TYPES OF ACTIVATION FUNCTIONS: (A) THRESHOLD (B) PIECEWISE
LINEAR (C) SIGMOID AND (D) GAUSSIAN. ..................................................................... 31
FIGURE 2. 3: A TAXONOMY OF FEED-FORWARD AND RECURRENT /FEEDBACK NETWORK
ARCHITECTURES (JAIN, 1996). ..................................................................................... 32
FIGURE 2. 4: A MULTI-LAYER PERCEPTRON. ....................................................................... 33
FIGURE 3. 1: BASIC ALGORITHM FOR AUTONOMY .............................................................. 42
FIGURE 3. 2: A MORE COMPREHENSIVE ALGORITHM FOR THE ROBOT ............................... 43
FIGURE 3. 3: IMAGE SHOWING THE ROBOT CHASSIS ........................................................... 45
x
11. LIST OF TABLES
TABLE 4. 1: CORRESPONDING TARGETS AND OUTPUTS FOR THE TRAINING THE MULTI-
LAYER PERCEPTRON. ................................................................................................... 54
TABLE 4. 2: ERROR AND PERFORMANCE FOR TEST IMAGES CLASSIFIED. ............................. 55
TABLE 4. 3: CLASSIFICATION OF IMAGES INTO GROUPS BY THE SELF-ORGANIZING MAP. .. 56
xi
12. 1
CHAPTER ONE
INTRODUCTION
1.1. Background
An autonomous vehicle is a kind of robot that is capable of automatic navigation.
It is self-acting and self-regulating, therefore it is able to operate in and react to its
environment without outside control.
After proving to be an efficient tool for improving quality, productivity, and
competitiveness of manufacturing organizations, robots now expand to service
organizations, offices, and even homes. Global competition and the tendency to reduce
production cost and increase efficiency creates new applications for robots that stationary
robots can't perform. These new applications require the robots to move and perform
certain activities at the same time. Moreover, the availability and low cost of faster
processors, better programming, and the use of new hardware allow robot designers to
build more accurate, faster, and even safer robots. Currently, mobile robots are expanding
outside the confines of buildings and into rugged terrain, as well as familiar environments
like schools and city streets (Wong, 2005).
Robots must be able to understand the structure of their environment. To reach
their targets without collisions, the robots must be endowed with perception, data
processing, recognition, learning, reasoning, interpreting, decision making and action
capacities (which constitutes artificial intelligence).
Therefore, to reach a reasonable degree of autonomy, two basic requirements are
sensing, reasoning and actuation. Sensing is to be provided by a web camera that gathers
information about robot with respect to the surrounding scene. Reasoning can be
13. 2
accomplished by devising algorithms that exploit this information in order to generate
appropriate commands for the robot. Actuation is by intelligent servo-motors.
For an autonomous outdoor mobile robot, the ability to detect roads existing
around is a vital capability. Unstructured roads are among the toughest challenges for a
mobile robot both in terms of detection and navigation. Even though mobile robots use
various sensors to interact with their environment, being a comparatively low-cost and
rich source of information, potential of cameras should be fully utilized (Dilan, 2010).
Road recognition using visual information is an important capability for
autonomous navigation in urban environments. Shinzato et al (2008) presented a visual
road detection system that uses multiple Artificial Neural Network (ANN), similar to
MANIAC (Multiple ALVINN Networks in Autonomous Control), in order to improve the
robustness. However, the ANN learns colours and textures from sub-images instead of all
road appearance. In their system, each ANN receives different image features (like
averages, entropy, energy and variance from different colour channels (RGB, HSV, and
YUV)) as input from sub-images. In the final step of algorithm, a set ANN’s outputs are
combined to generate only one classification for each sub-image. This classification
provides confidence factor for each sub-image classification of image that can be used by
control algorithm. The system does not need to be retrained all the time, therefore,
location of the road is not assumed.
Zhu et al. (2008) proposes an integrated system called road understanding neural
network (RUNN) for an autonomous mobile robot to move in an outdoor road
environment. The RUNN consists of two major neural network modules, a single three-
layer road classification network (RCN) to identify the road category (straight: road,
intersection or T-junction), and a two-layer road orientation network (RON) for each road
14. 3
category. Several design issues, including the network model, the selection of input data,
the number of the hidden units and the learning problems were adequately dealt with.
One of the most used artificial neural networks (ANNs) models is the well-known
Multi-Layer Perceptron (MLP) (Haykin, 1998). The training process of MLPs for pattern
classification problems consists of two tasks, the first one is the selection of an
appropriate architecture for the problem, and the second is the adjustment of the
connection weights of the network.
Extensive research work has been conducted to attack these issues. Global search
techniques, with the ability to broaden the search space in the attempt to avoid local
minima, has been used for connection weights adjustment or architecture optimization of
MLPs, such as evolutionary algorithms (EA) (Eiben & Smith, 2003), simulated annealing
(SA) (Jurjoatrucj et al., 1983), tabu search (TS) (Glover, 1986), ant colony optimization
(ACO) (Dorigo et al., 1996) and particle swarm optimization (PSO) (Kennedy &
Eberhart, 1995).
Recently, artificial neural networks based methods are applied to robotic systems.
In (Racz & Dubrawski, 1994), an ANN was trained to estimate a robot’s position relative
to a particular local object. Robot localization was achieved by using entropy nets to
implement a regression tree as an ANN in (Sethi & Yu, 1990). An ANN can also be
trained to correct the pose estimates from odometry using ultrasonic sensors.
Conforth and Meng(2008) proposed an artificial neural networks learning method
for mobile robot localization, which combines the two popular swarm inspired methods in
computational intelligence areas: Ant Colony Optimization (ACO) and Particle Swarm
Optimization (PSO) to train the Artificial Neural Network (ANN) models. These
15. 4
algorithms have been applied already to solving problems of clustering, data mining,
dynamic task allocation, and optimization in autonomous mobile robots.
1.2. Project Description
A vehicle is to be designed which can intuitively navigate its ways to destination
points, avoiding obstacles and obeying traffic rules without being pre-programmed.
Although a vehicle can be made to avoid obstacles using various common technologies
based on structured programming and algorithm standard, the intelligence of this vehicle
will be based on whether sufficient conditions were addressed or not and, whether the
algorithm is well designed or not. Afterwards, the drudgery of typing so many lines of
codes and debugging will have to be dealt with.
An improved technology is Artificial Intelligence or Machine Learning. Although
some implement this technology with the normal object oriented programing platform.
Some better emerging technologies for the implementation of artificial intelligence
include: Artificial Neural Networks and Support Vector Machines. The Problem is to
train the car to identify objects on its own and rationalize based on the training it has
received.
A system to be developed is to acquire training data (images), utilize this data to
train an ANN in Matlab, and implement the converged network on a laptop controlling a
model mobile robot. The training data is collected utilizing a laptop running Matlab that
controls the robot. The robot is to be steered through differential–drive control logic so
that it follows a path marked out on the floor. The primary sensor is to be a webcam
attached to the robot records images of the path to the laptop’s hard drive. The images
from the webcam are inputs to the Neural Network. This data is then used to train a
backpropagation ANN. After satisfactory ANN performance is achieved, the converged
ANN weight values are written to a file. These weight values are then read by a program
16. 5
that implements a feed forward ANN that reads webcam images, processes the images,
and then inputs the processed images to the ANN which then produces corresponding
steering control signals to the differential-drive system. Simulations and controls are to be
implemented using, principally, Matlab Image Processing Toolbox, Matlab Artificial
Neural Networks Toolbox and Dynamixel control functions in Matlab.
1.3. Objective of the Project
The primary objective of this project is to develop a procedure to implement and train an
Artificial Neural Network to control a mobile robot to follow a visibly distinguishable
path autonomously. The main idea is visual control of a vehicle for autonomous
navigation through a Neural Network platform (without a stereotype pre-programming).
1.4. Scope
To bite an adequate amount chewable, the problem has to be simplified and made
unambiguous. The task now is to make the vehicle take a certain action when it sees
images of specific classes. The project involves basic image acquisition and processing
for Neural Network training purposes and sending the output control commands to the
wheels of a model car. The circuitry utilized is that of the laptop computer. Separate
circuits have not been developed because ANNs require high memory capacity and
processing speed which can only be provided by a high-processing computer.
Nevertheless, the realization of ANN is an ongoing research work.
Although allusions to and efforts towards generalization, effectiveness and
sophistication are being made, the project does not treat exhaustively path-planning and
road-tracking.
1.5. Justification
The process of automating vehicle navigation can be broken down into four steps:
1) perceiving and modelling the environment, 2) localizing the vehicle within the
17. 6
environment, 3) planning and deciding the vehicle’s desired motion and 4) executing the
vehicle’s desired motion (Wit, 2000). There has been much interest and research done in
each of these areas in the past decade. This work focuses on perceiving the environment
and deciding the vehicle’s desired motion. Further work on localization and executing the
desired motion is an on-going research work. Nevertheless more finesse has to be put in
to modelling the environment, planning and deciding the vehicle’s desired motion. The
above process is to be implemented with Matlab Artificial Neural Network Toolbox and
Image Processing tool box.
Apart from the fact that the project creates an arousal of interest in solving the
research problems, the robot developed can be used for the scholastic purpose of concept
illustration particularly in Control and Instrumentation Engineering courses.
18. 7
CHAPTER TWO
LITERATURE REVIEW
2.1. Mobile Robots
2.1.1. Types of Mobile Robots
Many different types of mobile robots had been developed depending on the
kind of application, velocity, and the type of environment whether its water,
space, terrain with fixed or moving obstacles. Four major categories had been
identified (Dudek & Jenkin, 2000):
Terrestrial or ground-contact robots: The most common ones are the
wheeled robots; others are the tracked vehicles and Limbed vehicles.
Explained below:
1. Wheeled robots: Wheeled robots exploit friction or ground contact to
enable the robot to move. Different kinds of wheeled robots exist: the
differential drive robot, synchronous drive robot, steered wheels robots and
Ackerman steering (car drive) robots, the tricycle, bogey, and bicycle drive
robots, and robots with complex or compound or omnidirectional wheels.
2. Tracked vehicles: Tracked vehicles are robust to any terrain environment,
their construction are similar to the differential drive robot but the two
differential wheels are extended into treads which provide a large contact
area and enable the robot to navigate through a wide range of terrain.
3. Limbed vehicles: Limbed vehicles are suitable in rough terrains such as
those found in forests, near natural or man-made disasters, or in planetary
exploration, where ground contact support is not available for the entire
path of motion. Limbed vehicles are characterized by the design and the
19. 8
number of legs, the minimum number of legs needed for a robot to move is
one, to be supported a robot need at least three legs, and four legs are
needed for a statically stable robot, six, eight, and twelve legs robots exists.
Aquatic robots: Those operate in water surface or underwater. Most use
water jets or propellers. Aquatic vehicles support propulsion by utilizing the
surrounding water. There are two common structures (Dudek & Jenkin, 2000):
torpedo-like structures (Feruson & Pope, 1995, Kloske et al., 1993) where a
single propeller provides forward, and reverse thrust while the navigation
direction is controlled by the control surfaces, the buoyancy of the vessel
controls the depth. The disadvantage of this type is poor manoeuvrability.
Airborne robots: Flying robots like Robotic helicopters, fixed-wing aircraft,
robotically controlled parachutes, and dirigibles. The following
subclassifications are given according to Dudek and Jenkin (2000):
1. Fixed-wing autonomous vehicles: This utilizes control systems very
similar to the ones found in commercial autopilots. Ground station can
provide remote commands if needed, and with the help of the Global
Positioning System (GPS) the location of the vehicle can be determined.
2. Automated helicopters (Baker et al., 1992, Lewis et al., 1993): These use
onboard computation and sensing and ground control, their control is very
difficult compared to the fixed-wing autonomous vehicles.
3. Buoyant (aerobots, aerovehicles, or blimps) vehicles: These vehicles can
float and are characterized by having high energy efficiency ration, long-
range travel and duty cycle, vertical mobility, and they usually has no
disastrous results in case of failure.
20. 9
4. Unpowered autonomous flying vehicles: These vehicles reach their desired
destination by utilizing gravity, GPS, and other sensors.
Space robots: Those are designed to operate in the microgravity of outer
space and are typically envisioned for space station maintenance. Space robots
either move by climbing or are independently propelled. These are needed for
applications related to space stations like construction, repair, and
maintenance. Free-flying systems have been proposed where the spacecraft is
equipped with thrusters with one or more manipulators, the thrusters are
utilized to modify the robot trajectory.
2.1.2. Systems and Methods for Mobile Robot Navigation
Navigation is the major challenge in the autonomous mobile robots; a
navigation system is the method for guiding a vehicle. Several capabilities are
needed for autonomous navigation (Alhaj Ali, 2003):
• The ability to execute elementary goal achieving actions such as going to a given
location or following a leader;
• The ability to react to unexpected events in real time such as avoiding a suddenly
appearing obstacle;
• The ability to formulate a map of the environment;
• The ability to learn which might include noting the location of an obstacle and of
a three-dimensional nature of the terrain and adapt the drive torque to the
inclination of hills (Golnazarian & Hall, 2000).
The following basic systems and methods have been identified for mobile
robot navigation:
21. 10
1. Odometry and other dead-reckoning methods: These methods use encoders
to measure wheel rotation and/or steering orientation.
2. Vision based navigation: Computer vision and image sequence techniques
were proposed for obstacle detection and avoidance for autonomous land
vehicles that can navigate in an outdoor road environment. The object shape
boundary is first extracted from the image, after the translation from the
vehicle location in the current cycle to that in the next cycle, the position of the
object shape in the image of the next cycle is predicted, and then it is matched
with the extracted shape of the object in the image of the next cycle to decide
whether the object is an obstacle (Alhaj Ali, 2003, Chen & Tsai, 2000).
3. Sensor based navigation: Sensor based navigation systems that rely on sonar
or laser scanners that provide one dimensional distance profiles have been
used for collision and obstacle avoidance. A general adaptable control
structure is also required. The mobile robot must make decisions on its
navigation tactics; decide which information to use to modify its position,
which path to follow around obstacles, when stopping is the safest alternative,
and which direction to proceed when no path is given. In addition, sensors
information can be used for constructing maps of the environment for short
term reactive planning and long-term environmental learning.
4. Inertial navigation: This method uses gyroscopes and sometimes
accelerometers to measure the rate of rotation and acceleration.
5. Active beacon navigation systems: This method computes the absolute
position of the robot from measuring the direction of incidence of three or
more actively transmitted beacons. The transmitters, usually using light or
22. 11
radio frequencies must be located at known sites in the environment (Janet,
1997, Premvuti & Wang, 1996, Alhaj Ali, 2003).
6. Landmark navigation: In this method distinctive artificial landmarks are
placed at known locations in the environment to be detected even under
adverse environmental conditions.
7. Map-based positioning: In this method information acquired from the robot's
onboard sensors is compared to a map or world model of the environment. The
vehicle's absolute location can be estimated if features from the sensor-based
map and the world model map match.
8. Biological navigation: biologically-inspired approaches were utilized in the
development of intelligent adaptive systems; biomimetic systems provide a
real world test of biological navigation behaviours besides making new
navigation mechanisms available for indoor robots.
9. Global positioning system (GPS): This system provides specially coded
satellite signals that can be processed in a GPS receiver, enabling it to compute
position, velocity, and time.
2.1.3. Mobile Robots and Artificial Intelligence
While robotics research has mainly been concerned with vision (eyes) and
tactile, some problems regarding adapting, reasoning, and responding to changed
environment have been solved with the help of artificial intelligence using
heuristic methods such as ANN.
Neural computers have been suggested to provide a higher level of
intelligence that allows the robot to plan its action in a normal environment as
well as to perform non-programmed tasks (Golnazarian & Hall, 2000, Alhaj Ali,
2003).
23. 12
A well-established field in the discipline of control systems is the
intelligent control, which represents a generalization of the concept of control, to
include autonomous anthropomorphic interactions of a machine with the
environment (Alhaj Ali, 2003, Meystel & Albus, 2002). Meystel and Albus (2002)
defined intelligence as “the ability of a system to act appropriately in an uncertain
environment, where an appropriate action is that which increases the probability
of success, and success is the achievement of the behavioural sub goals that
support the system’s ultimate goal”. The intelligent systems act so as to maximize
this probability. Both goals and success criteria are generated in the environment
external to the intelligent system. At a minimum, the intelligent system had to be
able to sense the environment which can be achieved by the use of sensors, then
perceive and interpret the situation in order to make decisions by the use of
analyzers, and finally implements the proper control actions by using actuators or
drives.
Higher levels of intelligence require the abilities to: recognize objects and
events store and use knowledge about the world, learn, and to reason about and
plan for the future. Advanced forms of intelligence have the ability to perceive
and analyze, to plan and scheme, to choose wisely, and plan successfully in a
complex, competitive, and hostile world (Alhaj Ali, 2003).
Intelligent behaviour is crucial to mobile robots; it could be supported by
connecting perception to action (Kortenkamp et al., 1998). In the following a brief
review for the literature in the use of artificial intelligence in mobile robots will be
presented (Alhaj Ali, 2003).
24. 13
2.1.3.1. Use of Artificial Neural Networks (ANN)
ANN has been applied to mobile robot navigation. It had been considered
for applications that focus on recognition and classification of path features during
navigation. Kurd and Oguchi (1997) propose the use of neural network controller
that was trained using supervised learning as an indirect-controller to obtain the
best control parameters for the main controller in use with respect to the position
of an Autonomous Ground Vehicle (AGV). A method that uses incremental
learning and classification based on a self-organizing ANN is described by
Vercelli and Morasso (1998). Xue and Cheung (1996) proposed a neural network
control scheme for controlling active suspension. The presented controller used a
multi-layer neural network and a prediction-correction method for adjusting
learning parameters. Dracopoulos (1998) present the application of multi-layer
perceptrons to the robot path planning problem and in particular to the task of
maze navigation.
Zhu, et al. (1998) present results of integrating omni-directional view
image analysis and a set of adaptive networks to understand the outdoor road
scene by a mobile robot (Alhaj Ali, 2003). To navigate and recognize where it is,
a mobile robot must be able to identify its current location. The more the robot
knows about its environment, the more efficiently it can operate (Cicirelli, 1998).
Grudic and Lawrence (1998) used a nonparametric learning algorithm to build a
robust mapping between an image obtained from a mobile robot’s on-board
camera, and the robot’s current position. It used the learning data obtained from
these raw pixel values to automatically choose a structure for the mapping without
human intervention, or any prior assumptions about what type of image features
should be used (Alhaj Ali, 2003).
25. 14
2.1.3.2. Use of Fuzzy Logic
Fuzzy logic and fuzzy languages have also been used in navigation
algorithms for mobile robots as described in (Wijesoma et al., 1999, Mora and
Sanchez, 1998). Lin and Wang (1997) propose a fuzzy logic approach to guide an
AGV from a starting point toward the target without colliding with any static
obstacle as well as moving obstacles; they also study other issues as sensor
modelling and trap recovery. Kim and Hyung (1998) used fuzzy multiattribute
decision-making in deciding which via-point the robot should proceed to at each
step. The via-point is a local target point for the robot’s movement at each
decision step. A set of candidate via-points is constructed at various headings and
velocities. Watanabe, et al. (1998) described a method using a fuzzy logic model
for the control of a time varying rotational angle in which multiple linear models
are obtained by utilizing the original nonlinear model at some representative
angles (Alhaj Ali, 2003).
2.1.3.3. Use of Neural Integrated Fuzzy Controller
A neural integrated fuzzy controller (NiF-T) that integrates the fuzzy logic
representation of human knowledge with the learning capability of neural
networks has been developed for nonlinear dynamic control problems (Alhaj Ali,
2003). Daxwanger and Schmidt (1998) presented their neuro-fuzzy approach for
visual guidance of a mobile robot vehicle.
2.2. Autonomous Vehicles
2.2.1. Road Recognition and Following
Road recognition, detection and following problem for autonomous
vehicles (also known as unmanned vehicles or wheeled robots) has been an active
research area for the past several decades. Road detection for mobile robots is
26. 15
required for the environments which are dangerous for human-beings. Moreover,
it can be used for assisting humans while driving or operating a vehicle.
Road detection is an important requirement for autonomous navigation
even in the presence of assisting technologies such as GPS. Road recognition can
be performed using sensors such as; laser sensors, omnivision cameras, etc. and
several algorithms and applications are developed in the literature offering
satisfactory solutions. However, most of the satisfactory solutions cannot be
applied to all types of roads a mobile robot has to deal with during autonomous
navigation. Roads can be classified into two groups regarding to the setting as;
structured and unstructured roads.
Research on road detection for structured roads (i.e. asphalt roads) has
produced well-working solutions. Satisfactory unstructured road detection
algorithms using the several sensors other than vision sensors are available in
literature. However, unstructured road detection through the sole use of vision
sensors is still an open research area.
The problem of road detection can be to a large extent be solved by
tackling the issue of pattern recognition. Pattern recognition is a process of
description, grouping, and classification of patterns. In terms of information
availability, there are two general paradigms for pattern recognition -Supervised
and unsupervised schemes. A supervised scheme identifies an unknown pattern as
a member of a predefined class, while an unsupervised scheme groups input
pattern into a number of clusters defined as classes.
27. 16
Automatic pattern recognition has the primary tasks of feature extraction
and classification. Classical pattern recognition techniques include: Feature
extraction and dimensionality reduction.
Examples of classifiers include:
1. Bayesian optimal classifier
2. Exemplar classifier
3. Space partition methods
4. Neural Networks.
Generic feature extraction methods include:
1. Wavelet based analysis
2. Invariant moments
3. Entropy
4. Cepstrum analysis
5. Fractal dimension
` Methods of algorithm selection is be based on image preprocessing,
pattern recognition using geometric algorithm, line detection, extraction of curve
lines, semantic retrieval by spatial relationships, and structural object recognition
using shape-form shading (Funtanilla, 2008).
Generally speaking, a complete pattern recognition system employs a
sensor, a feature extraction mechanism and a classification scheme. Pattern
recognition is recognized under the artificial intelligence and data processing
environment. The usual approach in classifying pattern recognition in these fields
could be statistical (or decision theoretic), syntactic (or structural), or neural. The
28. 17
statistical approach is based on patterns generated by probabilistic system or from
statistical analysis. The syntactic or structural approach is based on structural
relationships of features. The neural approach uses the neural computing
environment using neural network structure.
The camera serves as the most common sensor system. The digital images
taken remotely from this sensor acts as the object where features are established
from which we try to extract significant patterns. Pattern is defined as an
arrangement of descriptors (length, diameter, shape numbers, regions). A feature
denotes a descriptor and a classification is defined by the family of patterns that
share a set of common property. The two principal arrangements used in
computerized pattern recognition using MATLAB programming is defined in
(Gonzalez et al, 2004) as vectors, for quantitative descriptions (decision theoretic)
and strings, for structural descriptions or recognition (represented by symbolic
information properties and relationships).
Quantitative descriptors such as length, area and texture fall in the area of
decision theoretic computerized pattern recognition system. Image pre-processing
techniques, such as image conversion, edge detection, image restoration and
image segmentation, are important prerequisites to computerized image
processing. MATLAB implements point, line and peak detection in the image
segmentation process. The segmentation process carries on until the level of detail
to identify the element (point, line, peak) has been isolated which is limited by the
choice of imaging sensor in remote processing application (Gonzalez et al, 2004).
29. 18
To complete the process for an efficient pattern recognition system, (Baker
1996) developed pattern rejector algorithm based on object recognition and local
feature detection.
Another method of pattern recognition was developed byo Zhang et al
(2008) wherein, image splicing detection can be treated as a two-class pattern
recognition problem, which builds the model using moment features and some
image quality metrics (IQMs) extracted from the given test image.
2.3. Vision and Image Processing
Digital image processing is the process of transforming digital information
(images). For the following reasons:
1. To improve pictorial information for human interpretation through
Noise removal
Making corrections for motion, camera position, distortion
Enhancements by changing contrast, colour
2. To process pictorial information by machine by
Segmentation - dividing an image up into constituent parts
Representation - representing an image by some more abstract models
Classification
3. To reduce the size of image information for efficient handling.
Compression with loss of digital information that minimizes loss of
"perceptual" information. JPEG and GIF, MPEG,
Multiresolution representations versus quality of service
30. 19
2.3.1. Pixels:
Photographs, for example, are described by breaking an image up into a mosaic of
colour squares (pixels). Depending on their final destination, the number of pixels
used per inch varies (PPI or DPI).
MATLAB stores most images as two-dimensional arrays (i.e., matrices), in
which each element of the matrix corresponds to a single pixel in the displayed
image. For example, an image composed of 200 rows and 300 columns of
different coloured dots would be stored in MATLAB as a 200-by-300 matrix.
Some images, such as RGB, require a three-dimensional array, where the first
plane in the 3rd dimension represents the red pixel intensities, the second plane
represents the green pixel intensities, and the third plane represents the blue pixel
intensities.
So, to reduce memory requirements, MATLAB supports storing image
data in arrays of class uint8 and uint16. The data in these arrays is stored as 8-bit
or 16-bit unsigned integers. These arrays require one-eighth or one-fourth as much
memory as data in double arrays. An image whose data matrix has class uint8 is
called an 8-bit image; an image whose data matrix has class uint16 is called a 16-
bit image.
2.3.2. Indexed Images
An indexed image consists of a data matrix, X, and a colormap matrix,
map. map is an mby-3 array of class double containing floating-point values in
the range [0, 1]. Each row of map specifies the red, green, and blue components
of a single colour. An indexed image uses "direct mapping" of pixel values to
colormap values. The colour of each image pixel is determined by using the
corresponding value of X as an index into map. The value 1 points to the first row
31. 20
in map, the value 2 points to the second row, and so on. You can display an
indexed image with the statements:
image(X); colormap(map)
A colormap is often stored with an indexed image and is automatically
loaded with the image when you use the imread function. However, you are not
limited to using the default colormap--you can use any colormap that you choose.
2.3.3. Methods in Image Processing
These methods are used to acquire images through the webcam and
convert them to forms which can be used by the neural network. The image
processing activity can be broken down into the sub-steps.
1. Image acquisition: We will acquire an image to our system as an input .this
image should have a specific format, for example, bmp format and with a
determined size such as 30 x 20 pixels. Image is acquired through the
digital web camera.
2. Image pre- and post-processing: The preprocessing stage involves:
a. Binarization: This is the conversion of the raw images acquired to
gray-scale and then to a binary image by choosing threshold value
from the gray-scale elements.
b. Morphological Operators – these remove isolated specks and holes in
the binary images. Can use the majority operator.
c. Noise removal: reducing noise in an image. For on-line there is no
noise to eliminate so no need for the noise removal. In off-line mode,
the noise may come from surface roughness and tiny particles of dirt or
debris moving with the breeze.
Other processing operations that can be carried out include:
32. 21
Contrast enhancement
De-blurring
Region-based processing
Linear and non-linear filtering
33. 22
3. Image analysis: This includes amidst others
a. Edge detection
b. Segmentation
2.4. Artificial Neural Networks
2.4.1. Introduction
Artificial neural networks are made up of interconnecting artificial neurons
(programming constructs that mimic the properties of biological neurons).
Artificial neural networks may either be used to gain an understanding of
biological neural networks, or for solving artificial intelligence problems without
necessarily creating a model of a real biological system. The real, biological
nervous system is highly complex and includes some features that may seem
superfluous based on an understanding of artificial networks.
In general, a biological neural network is composed of a group or groups
of chemically connected or functionally associated neurons. A single neuron may
be connected to many other neurons and the total number of neurons and
connections in a network may be extensive. Connections, called synapses, are
usually formed from axons to dendrites, though dendrodendritic microcircuits and
other connections are possible. Apart from the electrical signaling, there are other
forms of signaling that arise from neurotransmitter diffusion, which have an effect
on electrical signaling. As such, neural networks are extremely complex.
Artificial intelligence and cognitive modeling try to simulate some
properties of neural networks. While similar in their techniques, the former has the
aim of solving particular tasks, while the latter aims to build mathematical models
of biological neural systems. In the artificial intelligence field, artificial neural
networks have been applied successfully to speech recognition, image analysis
34. 23
and adaptive control, in order to construct software agents (in computer and video
games) or autonomous robots. Most of the currently employed artificial neural
networks for artificial intelligence are based on statistical estimation, optimization
and control theory. Artificial intelligence, cognitive modeling, and neural
networks are information processing paradigms inspired by the way biological
neural systems process data.
A neural network (NN), in the case of artificial neurons called artificial
neural network (ANN) or simulated neural network (SNN), is an interconnected
group of natural or artificial neurons that uses a mathematical or computational
model for information processing based on a connectionist approach to
computation. In most cases an ANN is an adaptive system that changes its
structure based on external or internal information that flows through the network.
In more practical terms neural networks are non-linear statistical data modeling or
decision making tools. They can be used to model complex relationships between
inputs and outputs or to find patterns in data.
According to Abdi (1999), Neural networks are adaptive statistical models
based on an analogy with the structure of the brain. They are adaptive because
they can learn to estimate the parameters of some population using a small number
of exemplars (one or a few) at a time. They do not differ essentially from standard
statistical models. For example, one can find neural network architectures akin to
discriminant analysis, principal component analysis, logistic regression, and other
techniques. (Jordan & Bishop, 1996).
Many neural network methods can be viewed as generalizations of
classical pattern-oriented techniques in statistics and the engineering areas of
35. 24
signal processing, system identification, optimization, and control theory. There
are also ties to parallel processing, VLSI design, and numerical analysis.
A neural network is first and foremost a graph, with patterns represented in
terms of numerical values attached to the nodes of the graph and transformations
between patterns achieved via simple message-passing algorithms. Certain of the
nodes in the graph are generally distinguished as being input nodes or output
nodes, and the graph as a whole can be viewed as a representation of a
multivariate function linking inputs to outputs. Numerical values (weights) are
attached to the links of the graph, parameterizing the input/output function and
allowing it to be adjusted via a learning algorithm.
A broader view of neural network architecture involves treating the
network as a statistical processor, characterized by making particular probabilistic
assumptions about data. Patterns appearing on the input nodes or the output nodes
of a network are viewed as samples from probability densities, and a network is
viewed as a probabilistic model that assigns probabilities to patterns. However, the
paradigm of neural networks - i.e., implicit, not explicit , learning is stressed -
seems more to correspond to some kind of natural intelligence than to the
traditional Artificial Intelligence, which would stress, instead, rule-based learning.
Neural networks usually organize their units (called neurons) into several
layers. The first layer is called the input layer, the last one the output layer. The
intermediate layers (if any) are called the hidden layers. The information to be
analyzed is fed to the neurons of the first layer and then propagated to the neurons
of the second layer for further processing. The result of this processing is then
propagated to the next layer and so on until the last layer. Each unit receives some
36. 25
information from other units (or from the external world through some devices)
and processes this information, which will be converted into the output of the unit.
The goal of the network is to learn or to discover some association
between input and output patterns, or to analyze, or to find the structure of the
input patterns. The learning process is achieved through the modification of the
connection weights between units. In statistical terms, this is equivalent to
interpreting the value of the connections between units as parameters (e.g., like the
values of a and b in the regression equation (y = a + b*x) to be estimated. The
learning process specifies the “algorithm” used to estimate the parameters.
2.4.2. Neural Network Architecture
Neural networks are made of basic units (Figure 3.1) arranged in layers. A
unit collects information provided by other units (or by the external world) to
which it is connected with weighted connections called synapses. These weights,
called synaptic weights multiply (i.e., amplify or attenuate) the input information:
A positive weight is considered excitatory, a negative weight inhibitory.
37. 26
Figure 2. 1: The basic neural unit processes the input information into the output
information.
38. 27
Each of these units is a simplified model of a neuron and transforms its
input information into an output response. This transformation involves two steps:
First, the activation of the neuron is computed as the weighted sum of it inputs,
and second this activation is transformed into a response by using a transfer
function.
The three basic Neural Network architectures are:
1. Feed-forward networks: All signals flow in one direction only, i.e. from lower
layers (input) to upper layers (output).
2. Recurrent (feed-back) networks: Signals from neurons in upper layers are fed
back to either its own or to neurons in lower layers.
3. Cellular: Neurons are connected in a cellular manner.
2.4.3. Transfer Functions
If each input is denoted xi, and each weight wi, then the activation is equal
to and the output denoted o is obtained as ( ) Different
transfer (or activation) functions, ( ), exist for transforming the weighted sum of
the inputs to outputs. The most commonly used ones are enumerated below (Refer
to the graphs on figure 2.2:
1. Threshold (sgn) function
{
2. Piecewise linear function
3. Linear function
39. 28
4. Sigmoid function
5. Gaussian function
( √ ) ( ⁄ )
2.4.4. Learning rules
Neural networks are adaptive statistical devices. This means that they can
change iteratively the values of their parameters (i.e., the synaptic weights) as a
function of their performance. These changes are made according to learning rules
which can be characterized as supervised (when a desired output is known and
used to compute an error signal) or unsupervised (when no such error signal is
used). The types of learning are:
2.4.4.1. Supervised Learning:
The Widrow-Hoff (a.k.a., gradient descent or Delta rule) is the most widely
known supervised learning rule. It uses the difference between the actual input of
the cell and the desired output as an error signal for units in the output layer. Units
in the hidden layers cannot compute directly their error signal but estimate it as a
function (e.g., a weighted average) of the error of the units in the following layer.
This adaptation of the Widrow-Hoff learning rule is known as error
backpropagation. With Widrow-Hoff learning, the correction to the synaptic
weights is proportional to the error signal multiplied by the value of the activation
given by the derivative of the transfer function. Using the derivative has the effect
of making finely tuned corrections when the activation is near its extreme values
(minimum or maximum) and larger corrections when the activation is in its middle
40. 29
range. Each correction has the immediate effect of making the error signal smaller
if a similar input is applied to the unit.
In general, supervised learning rules implement optimization algorithms
akin to descent techniques because they search for a set of values for the free
parameters (i.e., the synaptic weights) of the system such that some error function
computed for the whole network is minimized.
2.4.4.2. Unsupervised Learning
The Hebbian rule is the most widely known unsupervised learning rule; it
is based on work by the Canadian neuropsychologist Donald Hebb, who theorized
that neuronal learning (i.e., synaptic change) is a local phenomenon expressible in
terms of the temporal correlation between the activation values of neurons.
Specifically, the synaptic change depends on both presynaptic and
postsynaptic activities and states that the change in a synaptic weight is a function
of the temporal correlation between the presynaptic and postsynaptic activities.
Specifically, the value of the synaptic weight between two neurons increases
whenever they are in the same state; and decreases when they are in different
states.
2.4.5. Some important neural network paradigms
Neural network paradigms are formulated by a combination of network
architecture (or model) and a learning rule with some modifications. Refer to
figure 2.3 for the different learning paradigms.
One the most popular paradigms in neural networks are the multi-layer perceptron
(Figure 2.4). Most of the networks with this architecture use the Widrow-Hoff rule
as their learning algorithm and the logistic function as the transfer function of the
41. 30
units of the hidden layer (the transfer function is in general non-linear for these
neurons). These networks are very popular because they can approximate any
multivariate function relating the input to the output. In a statistical framework,
these networks are akin to multivariate non-linear regression. When the input
patterns are the same as the output patterns, these networks are called auto-
associators. They are closely related to linear (if the hidden units are linear) or
non-linear principal component analysis and other statistical techniques linked to
the general linear model (see Abdi et al., 1996), such as discriminant analysis or
correspondence analysis.
42. 31
(a) (b)
(c)
(d)
Figure 2. 2: Different types of activation functions: (a) threshold (b) piecewise linear (c)
sigmoid and (d) Gaussian.
43. 32
Figure 2. 3: A taxonomy of feed-forward and recurrent /feedback network
architectures (Jain, 1996).
45. 34
A recent development generalizes the radial basis function (rbf) networks
(Abdi et al, 1999) and integrates them with statistical learning theory (Vapnik,
1999) under the name of support vector machine or SVM (see Schölkopf et al,
2003). In these networks, the hidden units (called the support vectors) represent
possible (or even real) input patterns and their response is a function to their
similarity to the input pattern under consideration. The similarity is evaluated by a
kernel function (e.g., dot product; in the radial basis function the kernel is the
Gaussian transformation of the Euclidean distance between the support vector and
the input). In the specific case of rbf networks, the outputs of the units of the
hidden layers are connected to an output layer composed of linear units. In fact,
these networks work by breaking the difficult problem of a nonlinear
approximation into two more simple ones. The first step is a simple nonlinear
mapping (the Gaussian transformation of the distance from the kernel to the input
pattern), the second step corresponds to a linear transformation from the hidden
layer to the output layer. Learning occurs at the level of the output layer. The main
difficulty with these architectures resides in the choice of the support vectors and
the specific kernels to use. These networks are used for pattern recognition,
classification, and for clustering data.
2.4.6. Validation
From a statistical point of view, neural networks represent a class of
nonparametric adaptive models. In this framework, an important issue is to
evaluate the performance of the model. This is done by separating the data into
two sets: the training set and the testing set. The parameters (i.e., the value of the
synaptic weights) of the network are computed using the training set. Then
46. 35
learning is stopped and the network is evaluated with the data from the testing set.
This cross-validation approach is akin to the bootstrap or the jackknife.
The utility of artificial neural network models lies in the fact that they can
be used to infer a function from observations and also to use it. This is particularly
useful in applications where the complexity of the data or task makes the design of
such a function by hand impractical.
2.4.7. Real life applications
The tasks to which artificial neural networks are applied tend to fall within
the following broad categories:
Function approximation, or regression analysis, including time series
prediction and modeling.
Classification, including pattern and sequence recognition, novelty
detection and sequential decision making.
Data processing, including filtering, clustering, blind signal separation and
compression.
Application areas of ANNs include system identification and control
(vehicle control, process control), game-playing and decision making
(backgammon, chess, racing), pattern recognition (radar systems, face
identification, object recognition, etc.), sequence recognition (gesture, speech,
handwritten text recognition), medical diagnosis, financial applications, data
mining (or knowledge discovery in databases, "KDD"), visualization and e-mail
spam filtering.
47. 36
Computational devices have been created in CMOS for both biophysical
simulation and neuromorphic computing. More recent efforts show promise for
creating nanodevices for very large scale principal components analyses and
convolution (Yang et al, 2008). If successful, these efforts could usher in a new
era of neural computing that is a step beyond digital computing, because it
depends on learning rather than programming and because it is fundamentally
analog rather than digital even though the first instantiations may in fact be with
CMOS digital devices (Strukov et al, 2008).
2.4.8. Case Study of Two Neural Network Paradigms
2.4.8.1. Supervised Learning: Multi-Layer Perceptron
A feed forward network has a layered structure. Each layer consists of
units which receive their input from units from a layer directly below and send
their output to units in a layer directly above the unit. There are no connections
within a layer. (Krӧ se and Smagt, 1996). The inputs are fed into the first layer of
hidden units. The input units are merely ‘fan-out’ units; no processing takes place
in these units. The activation of a hidden unit is a function of the weighted inputs
plus a bias. The output of the hidden units is distributed over the next layer of
hidden units, until the last layer of hidden units, of which the outputs are fed into a
layer of output units as shown in figure 2.4.
The Multilayer Perceptron allows establishing decision regions which are
much more complex than the two semi-planes generated by the perceptron.
The back-propagation learning rule is one solution to the problem of how
to learn the weights and biases in the network. It is an iterative procedure. For
each weight and threshold, the new value is computed by adding a correction to
the old value:
48. 37
( ) ( ) ( )
1
( ) ( ) ( )
2
To compute ( ) and ( ), the gradient descent iterative method is
used. Then the back-propagation (or generalized delta rule) adjusts the weights of
the network using the cost- or error-function. In high dimensional input spaces the
network represents a (hyper) plane and it is possible to obtain multiple outputs. If
the network is to be trained such that a hyperplane is fitted as well as possible to a
set of training samples consisting of input values and the desired (or target)
output values . For every given input sample, the actual output of the network
differs from the target value by( ). The error function (also known as
least mean square), is the summed square error. The total error is given as:
∑ ∑( )
3
, where the index p ranges over the set of input patterns and represent the error
on pattern p. The LMS procedure finds the values of all the weights that minimise
the error function by a method called gradient descent which is to make a change
in the weight proportional to the negative of the derivative of the error as
measured on the current pattern with respect to each weight.
The fundamental idea of the back-propagation rule is that errors for the
units of the hidden layer are determined by back-propagating the errors of the
units of the output layer.
The output of a network is formed by the activation of the output neuron.
The activation function for a multilayer feed-forward network must be a
differentiable function of the total input:
49. 38
( )
4
where
∑
5
According to the gradient descent method, the generalised delta rule is given as:
6
The error measure defined as the total quadratic error for pattern p at the output
units is:
∑( )
7
where is the desired output for unit o when pattern p is clamped.
Training a network by back-propagation in a neural network consists of
two steps. The first step is the propagation of input signals forward through the
network. The second step is the adjustment of weights based on error signals
propagated backward through the network. As shown in Equation (7), the
performance of the system is measured by a mean square difference between the
desired target outputs and the actual outputs.
2.4.8.2. Unsupervised Learning: Kohonen Network/ Self -Organizing
Maps (SOM)
These networks can learn to detect regularities and correlations in their
input and adapt their future responses to that input accordingly. The neurons of
50. 39
competitive networks learn to recognize groups of similar input vectors. Self-
organizing maps learn to recognize groups of similar input vectors in such a way
that neurons physically near each other in the neuron layer respond to similar
input vectors.
The Kohonen layer (Kohonen 1984, 1988) is a “Winner-take-all” (WTA)
layer. Thus, for a given input vector, only one Kohonen layer output is 1 whereas
all others are 0. No training vector is required to achieve this performance. Hence,
the name: Self-Organizing Map Layer (SOM-Layer).
The Kohonen network (Kohonen, 1982, 1984) can be seen as an extension
to the competitive learning network, although this is chronologically incorrect.
Also, the Kohonen network has a different set of applications.
In the Kohonen network, the output units in S are ordered in some fashion,
often in a two-dimensional grid or array, although this is application-dependent.
The ordering, which is chosen by the user determines which output neurons are
neighbours.
Now, when learning patterns are presented to the network, the weights to
the output units are thus adapted such that the order present in the input space
is preserved in the output, i.e., the neurons in . This means that learning patterns
which are near to each other in the input space (where ‘near’ is determined by the
distance measure used in finding the winning unit) must be mapped on output
units which are also near to each other, i.e., the same or neighbouring units. Thus,
if inputs are uniformly distributed in and the order must be preserved, the
dimensionality of must be at least . The mapping, which represents a
discretisation of the input space, is said to be topology preserving. However, if
51. 40
the inputs are restricted to a subspace of a Kohonen network can be used of
lower dimensionality. For example, data on a two-dimensional manifold in a high
dimensional input space can be mapped onto a two-dimensional Kohonen
network, which can for example be used for visualisation of the data. (Krӧ se and
Smagt, 1996).
Usually, the learning patterns are random samples from . At time , a
sample ( ) is generated and presented to the network. Using the formulas for
competitive learning the winning unit k is determined. Next, the weights to this
winning unit as well as its neighbours are adapted using the learning rule
( ) ( ) ( )( ( ) ( ))
Here, ( ) is a decreasing function of the grid-distance between units and ,
such that ( ) For example, for ( ) a Gaussian function can be used,
such that (in one dimension!) ( ) ( ( ) ) Due to this collective
learning scheme, input signals which are near to each other will be mapped on
neighbouring neurons. Thus the topology inherently present in the input signals
will be preserved in the mapping.
Self-organizing maps (SOM) learn to classify input vectors according to
how they are grouped in the input space. They differ from competitive layers in
that neighboring neurons in the self-organizing map learn to recognize
neighboring sections of the input space. Thus, self-organizing maps learn both the
distribution (as do competitive layers) and topology of the input vectors they are
trained on (Demuth and Beale, 1992).
52. 41
The neurons in the layer of an SOM are arranged originally in physical
positions according to a topology function. The function gridtop, hextop, or
randtop can arrange the neurons in a grid, hexagonal, or random topology.
Distances between neurons are calculated from their positions with a distance
function. There are four distance functions, dist, boxdist, linkdist, and mandist.
Link distance is the most common. These topology and distance functions are
described in Topologies (gridtop, hextop, randtop) and Distance Functions (dist,
linkdist, mandist, boxdist). (Demuth and Beale, 1992).
53. 42
CHAPTER THREE
METHODOLOGY
The basic flowchart of the software procedure of this project is shown in Figure
3.1 while a more comprehensive one is shown in figure 3.2. It can be seen that the
software implementation is divided into perception, reasoning and action.
3.1. Chassis Fabrication
Before anything can be done a vehicle chassis must be available. To avoid the complexity
and much mechanics in axle and gearing system, it was decided that the use of gears and
axle will be avoided unless in cases where it is of utmost importance. Based on this
rationale, the vehicle will be turned right or left using the principle of Deferential Drive.
The chassis was constructed with the following steps:
1. The base was cut out from fibre glass material having dimensions of 18cm by
24cm.
2. The diagonals of the base were marked out.
3. The tires were fabricated from wood of thickness 2cm. The tires are 10cm in
diameter.
Wooden materials were chosen for the tyres to ensure rigidity of the vehicle.
To ensure proper friction, the curved surface areas of the tyres are to be lined
up with thin Dunlop material to aid traction.
4. The tyres were attached at the middle to a plywood material unto which the
motors (Dynamixel AX – 25f) were attached. The servo-motors (which have
been attached to the tyres) were attached to points equidistant from the
vertices and along the diagonals of the fibre-glass base. This is to ensure that
the centre of mass of the whole motor is located at the centroid.
54. 43
Autonomous Driving
Supervised Unsupervised
Perception Pre-Processing
Learning Learning
Start
Conversion of
Use Multi-Layer Use Kohonen
Image Matrix to
Perceptron Network
Column Vector
Training? Simulate? Simulate?
Image Acquisition Yes No No
Compound to Initialize MLP Initialize SOM
Input Matrix
Conversion to
Grayscale
Train Train
Yes Yes
Yes Supervised?
Thresholding
Yes
Yes
Validate Test
Conversion to
Binary Image Compound No
No
Corresponding
Target Value No
Test
Simulate
Get More
Images?
No
Simulate
Generate Dataset
Take
Necessary
Actions
Supervised?
Figure 3. 2: A More Comprehensive Algorithm for the Robot
55. 44
This is also to ensure that the motor is stable and does not tip over while
moving. The servo-motors were attached to the base with a strong adhesive
material.
5. Servo-motors were attached to the front tires only. The back tires were made
free. Although there will be reduction in effective torque in this case as
compared to if servo-motors were attached to all the tires, this method results
in effective cost reduction and less complex programming algorithm. The
Picture of the chassis is shown in figure 3.3
3.2. Vision, Image Acquisition and Processing
The following steps were taken in Image Acquisition and Processing:
1. Initialization of the Image Acquisition Device.
i. Plug in the Logitech webcam to the Laptop which has MATLAB 2009b
installed on it.
ii. Install and configure the webcam
iii. Retrieve information that uniquely identifies the Logitech webcam to
the MATLAB Image Acquisition Toolbox software.
iv. Delete any image acquisition object that exists in memory and unload
all adaptors loaded by the toolbox, by resetting the image acquisition
hardware with the command:
imaqreset
imaqreset can be used to force the toolbox to search for new
hardware that might have been installed while MATLAB was running.
57. 46
2. A reference image has to be firstly defined. This will be the standard image the
vehicle compares all other images acquired to. The steps involved include:
i. Create a video input object. The DeviceInfo structure returned
by the imaqhwinfo function contains the default videoinput
function syntax for a device in the ObjectConstructor field.
This is done with the following command:
ed = videoinput('winvideo',1);
ii. Configure image acquisition properties
triggerconfig(ed,'manual'); % makes the triggering
manually activvated by a command
set(ed,'TriggerRepeat',1); % allows triggering to
be done only once
ed.framesPerTrigger = 1; % makes just one frame
captured per trigger
iii. Start the object running, trigger the object and then acquire the
image data. Starting the object does not imply data is being logged.
Issuing the start command makes the program obtain exclusive use
of data acquisition device.
The trigger function initiates data logging for the video input
object. To issue the trigger function the TriggerType property
has to be set to ‘manual’ and the object must have been started.
The data logged is hereafter acquired into the MATLAB workspace
for manipulation and processing. The following lines of code
illustrate the above explanations:
start(ed); %start the object
Trigger(ed) %log in data
58. 47
init = getdata(ed,1); % acquire logged data
iv. Convert the image acquired from rgb colour map to binary image.
This allows for ease in post-processing.
inited = im2bw(init,map, 0.1); %converts the image
acquired to a binary image.
v. Normally images acquired by image acquisition device is
represented by a four dimensional array (H-by-W-by-B-by-F).
H represents the image height
W represents the image width
B represents the number of colour bands
F represents the number of frames returned.
A way of converting this 4-D array into a 1-D array has to be sought
out since the inputs to our neural network has to be in a 1-D format.
The following code makes the columns of the four dimensional
matrix successively strung out into a single long column:
comp = inited(:); %converts the 4-D array
into a 1-D array
vi. The object is stopped finally.
stop(ed);
3.3. Developing the Neural Network with Multi-Layer Perceptron
(Method 1)
This begins with initiation of the Multi-Layer Perceptron with Back-Propagation
algorithm and the suitable architecture form, train and transfer functions. A dataset which
is an array of image inputs and corresponding targets (parameters passed to the
Dynamixel actuators) is first acquired. This dataset is thereafter used to train the neural
network with Back-Propagation algorithm. After the training, then the neural network is
59. 48
validated and then tested. The main tool used here is the MATLAB Artificial Neural
Network Toolbox. The following steps were taken:
1. The third step is to compile the dataset used to train the neural network. The learning
mode adopted for the neural network is “Supervised Learning”. This involves feeding
the Neural Networks with various types of input and giving it the corresponding
target actions to take. Now to sufficiently train the Network, enough Input – Target
maps have to be specified. Conventionally this used to be done by a teacher who puts
in the target value for every input condition encountered. To save the drudgery in
having to do this for over a thousand or so input conditions, it was rather thought to
automate this process.
To get the dataset, the various tasks carried out in step 2 above have to be
repeated, but the image acquisition properties have to be changed for the dataset
object.
% Create video input object.
imaqreset % resets image acquisition device
vid = videoinput('winvideo',1); %define a new object
apart from the one above
% Set video input object properties for this
application.
% Note that example uses both SET method and dot
notation method.
set(vid,'TriggerRepeat',100); % allows triggering to
be repeated a hundred times
triggerconfig(vid,'manual'); % allows manual
triggering
60. 49
vid.FrameGrabInterval = 5; % The interval between
successive frame acquisition is 5sec.
vid.FramesPerTrigger = 1; % Acquires one image per
trigger. to allow each image to be processed
To specify the target every time a frame is captured, a correlation function has to
be initiated between this new image acquired and reference image. If the
correlation coefficient is within 0.5 less than or greater than zero, then it is seeing
a similar image, then, the target parameter set will move the vehicle forward and
if the coefficient is so far from zero, the image is different and the vehicle has to
move back.
2. Step 1 normally has to be repeated for like a thousand time, and compiled into an
array, to make up the dataset used to train the Neural Network. But the loop was
made to iterate only tem times to save time and space.
3. The semifinal stage entails configuring, training, validating and testing the
Artificial Neural Network. The algorithm used is a pattern recognition algorithm.
Out of the dataset,
70% is used to train the network
15% is used to validate the network
15% is used to test the network
The performance graph of the neural network is plotted to check whether the
network has been well trained. The confusion matrix is also plotted to check
whether the network has fared well in recognizing patterns.
3.4. Developing the Neural Network with Self Organizing Maps
(Preferred Method)
It was realized that the Multi-layer perceptron was not all that suited for image
recognition and classification, so the use of Self-Organizing Maps (SOM) was
61. 50
employed. As discussed earlier, SOM is a type of Competitive Neural Network
(CNN) which employs an Unsupervised Learning algorithm. The images to be
classified are fed into the Neural Network as input and the number of output
neurons is specified. The Network activates one of the several output neurons
depending on the minimum distance between the input and the weight.
To use this method in autonomous application, Images had to be acquired
using the codes in Appendix I. The images acquired (shown in Appendix IX) were
used to generate an input matrix for the SOM. Since this type of Neural Network
does not need Target specification, a new SOM was defined with hexagonal
topology having 3 by 1 neurons. The code used is shown in Appendix III. The
training algorithm used is Batch unsupervised weight/bias training (trainbuwb)
and was trained over 100 iterations.
3.5. Scene Identification
Lastly, the network is tested by randomly and at intervals, inputting images through
the webcam to see if it will be able to specify its target itself independent of the earlier
codes and the teacher.
3.6. Locomotion
After the platform has been set, a working control board for the servo-motor has to be
fabricated. The dynamixel servo-motor uses TTL serial technology in interfacing with
the outer world. But because the preferred communication technology with the motors
would be USB, a Dynamixel-to-USB converter has to be acquired together with its
driver installed on the laptop which is the main controller. The mechanism of turning
the robot is the differential drive. Using differential drive method, to turn right the left
wheel is made to accelerate while the velocity of the right wheel is kept constant.
62. 51
Conversely, to turn left, the right wheel is made to accelerate while the left wheel is
made to rotate at a constant velocity.
The necessary controls to the differential drive system are the ouputs gotten from the
Neural Network.
3.7. Steering
Matlab has been chosen to send commands to the dynamixel. Since the output of the
neural network is stored in Matlab, it will be easily implemented in the movement of
the robot. Control commands have been written to steer the robot using differential
drive method. Different library functions have been developed for the various steering
and navigation tasks (forward movement, backward movement, stop, right turn, left
turn). Forward movement is achieved by moving both front wheels at the same speed
in the forward direction. Backward movement is by making the wheels move at the
same speed in the reverse direction. While left turn is accomplished when the right
wheel is moving forward and the left wheel is moving in the reverse direction at
different speeds. Right turn is the direct opposite of the left turn. The various
functions for movement of the car are shown in Appendix IV
63. 52
CHAPTER FOUR
RESULTS AND DISCUSSIONS
5.1. Simulations and Results
5.1.1. Multi-Layer Perceptron Neural Network
The Visual part of the robot was made to observe a particular scene and some other
scenes different from the one it is used to. The Images acquired were processed and used
in testing the neural network. Thereafter the network was subjected to a call back process
wherein it observes scenes by itself and outputs a signal. In this simulation process the
signal output is a wav file. When the Network observes the scene it has been used to, it
plays a particular file and makes the robot move in some defined way. When it sees a
different thing it plays another file and the robot undertakes a different movement
scheme. Matlab is the tool used for this process. The matlab code used is shown in
Appendix II.
The images acquired for training are shown in Appendix IX. It can be seen that
Images 2, 3, 4, 7, 8, 9, 10 and 12 are identical. These represent the accustomed scene.
Images 1, 5, 6, and 11 are variations in scenes. During the testing process the outputs
given by the network is shown in Table 3.1 together with the desired output. It can be
seen that there are errors in classifying images 7 and 9.
The network has two layers and one output unit. It uses mean square error to evaluate
the error function. It uses scaled conjugate method for training. It used images 2, 4,
5,6,7,8, and 9 for training; images 3 and 10 for validation and images 1 and 11 for testing.
The network went through 14 iterations before it reached the minimum gradient. The GUI
showing the state of the neural network is shown in Appendix III. The confusion matrix is