9599632723 Top Call Girls in Delhi at your Door Step Available 24x7 Delhi
Business Dissertation Thesis
1. The Application of Artificial Neural Networks to Business
Problems
Bryan Mills
University of Plymouth Business School
(Franchised to Cornwall College)
Honours project submitted as partial fulfilment for the degree of
BA Honours in Business Administration
Supervisor: Dr Jon Tucker
14th May 1997
2. Acknowledgements:
Firstly I would like to thank my supervisor Dr Tucker for his patience and generosity
and, in addition, to acknowledge the contribution he has made to this dissertation. I would
also like to thank Dave Ager, Jill Ferret and Mike Trennary for their tolerance and
encouragement during both the dissertation and the degree programme. I would like to
acknowledge the encouragement I have received throughout the degree from Buzz Banks,
Helen Cobbin and Ken Waller. Also, I would like to take the opportunity to thank Paul
Ingram for his uncompromising and contagious obsession with academia and Tony Butt for
first introducing me to Chaos Theory and non-linearity.
2
3. Bryan Mills 1997
Abstract:
Artificial Neural Networks (ANNs) provide a powerful information technology
based tool for decision making purposes. However, present literature on the subject
is often found to be either inaccessible or of limited relevance to (general) business
application. In this report ANNs are described in a more intuitive manner than found
within much of the existing literature. Emphasis is placed upon the use of ANNs
within the business environment, although the study still provides an introduction for
wider application. Misconceptions surrounding ANNs, and Artificial Intelligence in
general, are explored and recommendations are made with a view to their resolution.
The advantages and disadvantages of ANNs are discussed and present applications
are listed with a view to demonstrating the various application possibilities of ANNs.
To enable wider application of ANNs within business, and to reduce misguided
application, a schema has been developed. This schema, which has been developed
as both a flowchart and a computer program, allows the potential ANN user to
critically appraise the use of ANNs for a given decision making problem.
page 3
4. Contents:
(modified simple)
ACKNOWLEDGEMENTS:.......................................................................................................................
ABSTRACT:...............................................................................................................................................
LIST OF DIAGRAMS: (BUILD FROM TABLE OF FIGURES STYLE - DELETE LIST OF
TABLES FIRST).........................................................................................................................................
LIST OF TABLES: (BUILD FROM TABLES STYLE)............................................................................
GLOSSARY OF TERMS:..........................................................................................................................
CHAPTER 1 - INTRODUCTION..............................................................................................................
General Introduction...............................................................................................................................
Popular Misconceptions Concerning Neural Networks:........................................................................
Chapter 2 - Discussion of Aims, Methodology and Research Philosophy............................................
Aim:.........................................................................................................................................................
Objectives:...............................................................................................................................................
Benefit of the project to industry and commerce:...................................................................................
The growth of research in the neural area:............................................................................................
Methodology and Approach:...................................................................................................................
Schema development:..............................................................................................................................
Chapter 3 - Explanation of the Fundamental Concepts of ANNs.........................................................
Introduction:............................................................................................................................................
An outline explanation of the fundamental concepts of Artificial Neural Networks:..............................
Knowledge Based Systems:.....................................................................................................................
The difference between Artificial Neural Networks and Conventional Knowledge Based Systems:......
Explanation of the operation of ANNs:...................................................................................................
First Principles:.......................................................................................................................................
Knowledge Based Systems:................................................................................................................
4
5. Bryan Mills 1997
Rule Based System:..................................................................................................................................
Artificial Neural Networks:.....................................................................................................................
Overview:............................................................................................................................................
Components:........................................................................................................................................
Nodes:..................................................................................................................................................
Weights and bias terms:.......................................................................................................................
Generalisation:.....................................................................................................................................
Choice of mapping or activator function:............................................................................................
Data pre-processing:............................................................................................................................
Training:..............................................................................................................................................
Topology:............................................................................................................................................
The Multilayer Perceptron - an example of supervised learning/training:............................................
The Kohonen self organising net- an example of unsupervised learning/training:................................
Summary:.................................................................................................................................................
Chapter 4 - Investigation into advantages, disadvantages and current application of ANNs...........
Introduction:............................................................................................................................................
Advantages and disadvantages:..............................................................................................................
Current application of ANNs:.................................................................................................................
Summary:.................................................................................................................................................
Chapter 5 - Schema for the assessment of the suitability of ANNs for given problem.......................
Introduction:............................................................................................................................................
Schema:...................................................................................................................................................
Explanation of Schema:...........................................................................................................................
Summary:.................................................................................................................................................
Chapter 6 - Conclusions and Recommendations....................................................................................
Conclusion:..............................................................................................................................................
Limitations:..............................................................................................................................................
Further Research.....................................................................................................................................
Appendix 1 - Example of Training Process.............................................................................................
page 5
6. Appendix 2 - Bayesian Updating:............................................................................................................
Appendix 3 - Instructions for Running The Computer Program:........................................................
Appendix 4 - Computer Code-list............................................................................................................
Appendix 5 - Sample Output:...................................................................................................................
Appendix 6 - Visual Basic as a Programming Language:.....................................................................
Bibliography:.............................................................................................................................................
List of Diagrams: (build from table of figures style - delete list of tables first)
Diagram 1 - Patent Activity..................................................................................................................................................................
Diagram 2 - Methodology.....................................................................................................................................................................
Diagram 3 - Knowledge Based System.................................................................................................................................................
Diagram 4 - Single Neuron Calculation...............................................................................................................................................
Diagram 5 - Representation of a Neuron.............................................................................................................................................
Diagram 6 - Screen dump of a text file for use in WinNN....................................................................................................................
Diagram 7 - Class Membership............................................................................................................................................................
Diagram 8 - Universe of objects...........................................................................................................................................................
Diagram 9 - Sigmoid Function.............................................................................................................................................................
Diagram 10 - The Multilayer Perceptron.............................................................................................................................................
Diagram 11 - Kohonen Self Organising Feature Map.........................................................................................................................
Diagram 12 - The operation of ANNs - flow diagram..........................................................................................................................
Diagram 13 - Schema...........................................................................................................................................................................
List of Tables: (build from tables style)
Table 1 - Sample Problem...........................................................................................................................
Table 2 - Simplified weight method.............................................................................................................
Table 5 - Input file explanation...................................................................................................................
Table 4 - Sigmoid values.............................................................................................................................
Table 5 - Data pre-processing.....................................................................................................................
6
7. Bryan Mills 1997
Glossary of terms:
Activator function - an equation (mapping function) which describes a neuron’s
internal state as the total of its inputs; net = ∑xiwi-θ, where x is an
input w is a weight.
Algorithm - a procedure or series of steps used to solve a problem
Autoassociative - mapping the original pattern from noisy or incomplete data
Backpropagation - an algorithm which compares results with expected answers and
then passes the difference back through the network to facilitate
weight adjustment.
Bias term - A systematic error (θ) introduced to each node independently to allow
control over the otherwise independent node output.
Cell - A neuron
Database - In this instance, a set of facts (data) stored within a computer system
Dependant variable - A variable which will be altered or created by the change in
value of an independent variable(s). Normally shown on the left
hand side (LHS) of an equation..
EPOS - Electronic point of sale - the computer connection between cash-tills and the
central computer within a retail store
EPS - Earnings per share (accountancy measure)
Front-end subsystem - A computer program designed to simplify (humanise) the
input and output of data
Fuzzy - A set whose members belong to it to some degree. In contrast a standard set
contains its members either all or none (Kosko, 1993).
Function - A rule which maps one set element onto a different element in another set,
page 7
8. sales level could be said to be a function of demand
Generalise - The ability to identify a wide range of objects, patterns etc. from a
minimal set of key descriptive data
Heteroassociative - mapping input pattern set to different output pattern set
Hyperplane - A plot involving more than 3 dimensions and therefore difficult to
represent graphically
Independent variable - A variable which will alter or create the change in value of a
dependent variable(s). Normally shown on the right hand side (RHS)
of an equation
Inference engine - The part of a knowledge based system’s programming which
deduces results from given facts/data
Knowledge based system - The separation of data and control (algorithms) allowing
the computer to respond to a series of differing inputs by calling on a
library of information (knowledge base) as opposed to altering
variables contained explicitly within the program’s structure.
Mapping function - A rule linking the elements of one set to those of another; usually
shown as F:x→y; the function which maps the x onto y.
Multivariable - Containing a large number of independent variables
Network - A collection of interconnected nodes forming a topology
Neuron - A single activator function, a processing element, a mapping function
through which variables must pass, a calculation point
Nodes - Neurons
Non-linearity - Equations containing powers, roots, trigonometric or logarithmic
functions.
8
9. Bryan Mills 1997
Normalisation - A form of data pre-processing which seeks to give all inputs/outputs
a commonality by constraining their values to within a pre-
determined range
Pre-processing - Alterations to data before use (normalisation, removal of outliers,
ratio splitting). Usually conducted with the intention of increasing
the networks efficiency or conversion of non-numeric data to
numeric.
Propositional logic/calculus - A step by step inference system for determining
whether a given proposition is true or false. There are various forms
of propositional logic (modus ponens, modus tollens, denial of
antecedent etc. ), but all are based on a deviations of: If x is true then
y must be true/false, If and only if x is true then y is true/false etc.
(Eysenck and Keane, 1995).
Ratio splitting - Using the component parts of a ratio separately as opposed to using
the result (GPMargin = GP/Sales; use GP and Sales as input not GP
Margin)
Real-time - The collection and processing of data as events occur as opposed to the
use of historic data. EPOS works in real-time
ROCE - Return on capital employed (accountancy measure)
Set - A collection of elements defined by a rule which makes them separable from
other sets - e.g. men and women are two separate sets (separated by
sex) but are also within the common set of humans (separated from
other animal forms by species)
Sigmoid - A common ANN Activator function. An equation which has the effect of
reducing all independent variables to an answer of between tending
page 9
10. towards 1 and tending towards 0 (never reaching either 1 or 0) and is
generally given thus:
1
fnonlinear ( x ) =
1 + e−x
where x is summed input and e is the mathematical constant that is the
base of natural logarithms (2.71828.....)
Topology - In this instance an attempt to graphically represent the interconnection of
nodes within the network. Topology is often one of the key
distinguishing features separating different ANNs (others being
training method and activator function)
Training method - As ANNs self learn by exposure to data it is necessary to have an
algorithm which allows the ANN to distinguish between correct and
incorrect responses. This may either be supervised (told when
incorrect and what should have been the output), unsupervised (self
learning pattern recognition) or reinforced (told simply if correct or
incorrect)
Training set - A collection of data used to train the ANN, usually separated into a
training set and a hold out or test set
Vector - A quantity which has both magnitude and direction. ANN’s input consists
of a one dimensional array of differing x values of the form
x1w1+x2w2+x3w3+...xnwn, where x indicates input and w indicates
weight
Weights - A value which is altered by the ANN to enable the emphasis of the variable
upon which it acts to be either strengthen or weakened. A variable
coefficient which determines strength of an input’s effect on output
10
11. Bryan Mills 1997
Chapter 1 - Introduction
General Introduction
Business involves a complex mix of people, policy and technology, and exists
within the constraints of economics and society (Clifton and Sutcliffe, 1994). It is
often the precise way in which these items are mixed that can create either success or
failure for an organisation. This presents the manager with two key tasks; the
efficient collection and analysis of all relevant information. From this analysis the
manager will be able to formulate strategies, define objectives and implement plans
for there fruition. The provision and analysis of information, within business, is often
referred to as the decision support process and the methodology adopted referred to as
decision support systems (DSS).
Business decisions can often be viewed as the solution of various
mathematical problems. Whether it be determining the price level of a product, the
benefit of expansion into a new market, staff levels or the probability of a project
failing mathematics usually plays a role. In fact, due to the overriding objective of
“maximising shareholders wealth (McLaney, 1994)” found within all profit making
organisations, it can be said that, as wealth/profit is measured numerically, it would
be difficult, if not impossible, to view the organisation meaningfully in any other
way1.
One of the key problems in any decision is the availability and cost of
“perfect information”. Given perfect information (all the facts concerning a decision
with complete confidence in these predictions being correct) there would be little for
the business manager to decide, it would simply be a choice of the project which
1 Non-profit making organisation seek cost efficiency - another mathematical measure
page 11
12. maximised overall contribution2. Most decisions, however, are not based on perfect
information. This is generally due to a combination of the prohibitive cost of
gathering such information, the availability of information and the intrinsic
unpredictability and complexity of the markets in which business operates.
Ongoing developments in the field of Information Technology has enabled
the gathering, storage and retrieval of much larger quantities of information than was
previously possible. Stock Markets can be observed in real-time, supermarkets know
the exact quantities of goods on their shelves (via Electronic Point of Sale (EPOS))
and their customers weekly shopping lists (via Loyalty Cards), companies can
measure the exact output of machines on the shop floor (via Computer Aided
Manufacturing). This information is, however, worth only as much as the gain
derived from its ownership. To be able to quote a share price or stock level is fine,
but the information has already become historic. What is required in decision making
is a means by which to identify patterns and trends in the large volumes of data
currently available, and to increase the confidence in the predictability of this data to
an acceptable level.
The capabilities Artificial Neural Network (ANN) models have in recognising
patterns and trends in large volumes of data has meant that they are being
increasingly used for a variety of industrial/commercial applications.
ANNs are a form of computer software which took their original inspiration
(McCulloch and Pitts, 1943) from mans limited understanding of the workings of the
human brain. Research has been carried out in this area for two broad reasons. The
first and original was an attempt to model the human brain electronically to develop
a greater understanding of its operation. The second, and most relevant in this
2 Overall Contribution - the manager would consider the organisation’s other ventures, market share, market growth and long
term survival in his/her decision
12
13. Bryan Mills 1997
instance, is the development of these models as a mathematical tool for studying
patterns and relationships in data.
The mathematics which form these models are particularly useful when
dealing with non-linear problems, problems which cannot be graphed by use of a
straight line, of which there are numerous examples in business (demand/price,
production level/cost, share price/ROCE/EPS - an increase in the independent
variable (price) does not guarantee a proportionate increase in the dependant
(demand)). ANNs are also capable of dealing with dependant variables which may
have several variables acting on them (e.g. interest rates, inflation and estimation of
risk - in cost of capital calculation), the relationship between each being both
theoretically appreciated and explainable but not easily converted into an equation or
algorithm (Klimasauskas, 1991 and Scocken, 1994).
It is the ability to deal with non-linearity, multivariables and large volumes of
data which gives ANNs what is perhaps their most impressive features - pattern
recognition and self learning. ANNs receive their information (their knowledge) via a
process of training. Sets of data and desired results are passed through the network
until the computer is able to create, to a reasonable degree of accuracy, the desired
result. This is made possible by the networks ability to generalise the training data
presented to it and form an output, given new inputs, based on this generalisation.
Once this training stage is complete a problem (independent variables) can be input
and a result (dependent variable) is generated.
Current application of ANNs includes, amongst others; stock and money
market forecasting (Trippi and Turban,1996), face and handwriting recognition
(Rogers, Kabrisky, Ruck and Oxely, 1994), recognising whether station platforms are
busy or not, missile direction systems, voice recognition, voice control of computers,
page 13
14. data mining (Wiggins, 1994), industrial signal processing (Wiggins, 1994), modelling
of traffic flow (Recker, 1995), human resource management (redundancy selection)
(Coit, 1996), new product feasibility studies (Madu, 1995), risk evaluation, chemical
analysis, weather forecasting and resource management (Davalo, Naïm, 1991), a
complement to business decision support systems (Scocken, 1994), operations quality
control (Horridge, 1997) and the processing of marketing data.
Popular Misconceptions Concerning Neural Networks:
The subject of Artificial Neural Networks (ANNs) is an example of a name
not being self explanatory. The description ‘Artificial Neural Network’ is a
misnomer, it suggests an artificial representation of the human mind (it being
composed of a network of neurons). Exciting though the creation of an ‘artificial
mind’ would be, the ANNs currently in operation are little more than computer
programs capable of doing clever ‘sums’. The cleverness of these ‘sums’, however,
is not to be taken lightly. Systems have been developed which are able to identify
patterns in very large samples of data, produce a method of calculating relationships
between data where conventional mathematics would have been inadequate for
practical application, and represent a very strong possibility of development of
systems better suited to understanding our own fuzzy3 world.
As a subject, ANNs are fairly inaccessible and fraught with misconceptions.
The subject is clouded by two separate, but interrelated explanations, and this
difficulty is further compounded by the absence of accessible knowledge. On the one
hand there are the works of various academics and academic institutions. On the
other is the general public’s4 understanding of what ANNs represent - if they are
3 Fuzzy - e.g. Language - hot, warm, cold mean different temperatures to different people and the boundary between hot and
warm (for example) is not clear (is 18 degrees warm, 19 degrees hot and just as hot as 28 degrees?)
4 Used here simply to describe those outside of the fields of Mathematics, Computing and Psychology - not intended to be in
any way derogatory.
14
15. Bryan Mills 1997
aware of them at all.
The public’s understanding stems mainly from the world of science fiction
and ‘popular’ science programmes. It is a world of Arthur C Clarke’s HAL (2001
etc.), Philip K. Dick’s Bladerunner5 - thinking machines which inevitably turn on
their creators, with devastating results6. This understanding is not assisted by the
anthropomorphic nature of the language surrounding ANNs and the willingness of
some academic’s to emphasise this definition (for example - Professor Aleksander,
Imperial College London - “Magnus [a computer program] has a mind of its own” -
(Millar7, 1996)). The use of words such as ‘thinking’, ’neuron’ and ‘understanding’
all point towards machines which, eventual, may replicate the human thought process
to the point of being conscious. The reality of the situation is quite different, at
present computers can represent little more than a few thousand neurons, compared to
10,000 in a Cockroach’s brain and 100 billion in a humans (The Economist, 1995).
The academic world often uses anthropomorphic terms to overcome some of
the limitations of language and the mathematical nature of more correct descriptions.
For example in the development of a computer system to control the heating, lighting
and ventilation of an office building one may be tempted to use expressions such as
-“to develop a system which is aware of its environment”. However, use of the word
‘aware’ may suggest consciousness and use of ‘its environment’, as opposed ‘the
environment in which it operates’ could suggest ownership and, therefore, existence
beyond being an object. The difficulty stems from the absence of a more correct, and
equally as convenient, shorthand. The alternative - “to develop a system which
constantly monitors the surrounding environment and compares this information with
5 More correctly - the original book was called- Do Androids Dream of electric Sheep
6 The defence analyst and writer Warwick Collins has gone so far as to call on the government to restrict the human attributes
scientists can give programmes/machines (Millar 1996, The Guardian Newspaper, (17/12/96) page 4, eighth paragraph)).
7 The Guardian Newspaper, 17/12/96 page 4, second paragraph)
page 15
16. a pre-programmed set of ‘ideal’ conditions” - explains the process with a reduced
likelihood of confusion, but is not necessarily more accurate. The readers frame of
reference provides the key to which language would be more appropriate.
The use of such terminology creates few problems within the field because the
level of understanding is such that the words used often have two separate meanings -
the computer related meaning and the human related meaning - for example:
Neuron -
• Human related meaning - a cell which responds to various inputs by
producing responses - a processing unit.
• Computer related meaning - a part of ANN computer program which
performs a calculation - a processing unit.
The definitions are similar and would appear to suggest that, if a significant number
of ‘computer neurons’ were assembled, a human brain could be replicated. Whilst
this formed the inspiration behind some of the early research in the field (for example
Rosenbalt 1958, 1961), modern theory points to a level of complication within the
human brain which makes the early optimism seem naive at best.
A more comprehensive discussion on matters of human and machine
consciousness is found in Penrose, 1988, Emperors New Mind, and 1994, Shadows
of The Mind.
This thesis is intended to explain Artificial Neural Networks in such a way as
to reduce some of the confusion which often surrounds the topic. In addition it is
intended to simplify the application of ANNs (to a given problem) by the
development of a schema (both paper based and as computer program). This schema
16
17. Bryan Mills 1997
will greatly simplify the choice faced by the manager when considering which
mathematical tools to use in both decision, classification and control problems. To
enable the full value of this schema to be realised the thesis begins with a
comprehensive review and simplification of existing literature. As previously
discussed the confusion stems from three broad areas - media hype, anthropomorphic
descriptions and texts aimed at a specialist reader (scientific) and it is intended that
this thesis will contribute towards redressing this balance.
page 17
18. Chapter 2 - Discussion of Aims, Methodology
and Research Philosophy
Aim:
This study aims to develop a level of understanding from which the business
manager (who is unlikely to be an IT specialist) can establish the relative
merits/demerits of the ANN technique for business decision support analysis.
The project aims to make inroads into some of the more accessible academic
texts with a view to creating a more intuitive guide to ANN use aimed at the business
manager and student. To aid this explanation a schema or system will be developed
whereby the reader can assess the suitability of ANNs for a problem they wish to
solve. To assist in the discussion on the suitability of ANNs for given problems there
will also be an assessment of current uses and the advantages and disadvantages that
application presents.
Objectives:
1) To conduct a literature review of the fundamental concepts underling ANNs.
2) To examine the existing use of ANNs.
3) To develop a system to enable problems to be assessed for the suitability of ANN
application.
Benefit of the project to industry and commerce:
Progress in the development of ANNs is closely tied to the development of
computer equipment. It is only within the past 5 years that computing power has
become cheap enough to make ANN use a viable possibility. However ANNs have
remained in the exclusive domain of the scientist and mathematician for the past 45
18
19. Bryan Mills 1997
years and there are few accessible texts for the non-specialist.
The field of ANN contains possible solutions to business problems not fully
addressed by present mathematical techniques (Tucker, 1997). As Gleick (1993) and
Waldrop (1992) have commented, non-linearity of patterns are rife in the enormous
volumes of information produced by industry and commerce (e.g. the financial pages,
actuary data, market research responses). ANNs enable the user to analyse this data
more accurately than traditional problem solving techniques, making them a
commercial advantage to many industrial sectors.
The growth of research in the neural area:
The field of ANNs is expanding at an amazing rate. The expansion of the
subject is closely linked to technological developments in the IT field. As this area
continues to develop8 there will be an increasing expansion of opportunities in the
field of ANNs (Medsker, Turban and Trippi, 1996). Funding of research within the
field of ANNs is continuing with the Japanese government having budgeted $250
million over next 10 years, and the US government having pledged research funding
of $400 million over next 6 years (The Economist, 1995).
Patents Registered USA
300
250
200 Combined
Number
150 Comp. Int
ANN
100
50
0
1986
1987
1988
1989
1990
1991
1992
Year
Diagram 1 - Patent Activity
8 Moores Law suggests a doubling of the number of chips on a transistor every 18 to 24 months (J. Scholfeild, 1996, The
Guardian Newspaper (31/10/96) page 3 Online Section).
page 19
20. Diagram 1 shows patent activity in the USA for the years 1986-92. It can be
seen from the graph that the growth of work within this field is almost exponential. It
is also important to note that the full extent of ANN’s application within business
(particularly finance) has yet to be realised (Farrar, Tucker and Bugmann, 1997).
Methodology and Approach:
The project is based mainly on a comprehensive literature survey and review
of texts within the field of ANNs. The literature search was conducted in the first
instance to develop a clear understanding of the subject. From this, a succinct
explanation of the concepts underpinning artificial neural networks, aimed at business
managers, has been produced. The greater understanding engendered by the literature
research provides the basis for an analysis of the advantages and disadvantages of the
use of ANNs and forms the foundation of the schema development.
Diagram 2 - Methodology
20
21. Bryan Mills 1997
The above chart (Diagram 2) represents the flow of tasks from development of
original synopsis to the conclusions and recommendations.
Schema development:
The schema, which forms the most pragmatic part of the thesis, was
developed from the literature research. The schema seeks to answer the question
“Do ANNs offer a realistic solution to a given problem”. As ANNs are capable of
dealing with a variety of problems, and as the business community usually has a
variety of different problems under review, it is intended that the schema will be
general in its approach, whilst maintaining effectiveness and accuracy.
The schema is developed both as a flow-chart and as a computer program.
By establishing the specific data and training requirements of ANNs it is possible to
construct a series of questions of a non-technical nature, which the manager can
consider concerning the problem under review. The schema follows the flow of the
responses and culminates in a suggestion for further action. The reasons for the
suggested actions are explained, allowing the manager to consider various courses of
action depending on the resources available to him or her. Where appropriate the
schema will suggest alternative decision making techniques which could prove more
cost efficient or accurate than the use of ANNs.
page 21
22. Chapter 3 - Explanation of the Fundamental
Concepts of ANNs
Introduction:
This chapter will seek to place ANN in the broader context of computer software. A
highly simplified description of the workings of ANNs will follow. Once this basic
understanding has been enabled a more detailed explanation will follow, which is
intended to equip the reader with a reasonable level of knowledge on the topic, to
enable both further study or practical application.
An outline explanation of the fundamental concepts of Artificial Neural Networks:
An ANN is simply a computer program which, through the adjustment of
mathematical weights, is able to create a model capable of producing results (usually
in the form 1 or 0, or scaled using decimals from 0 to 1) , for a given set of numeric
input data, to a reasonable degree of accuracy. The network will often include Front-
end subsystem (Attrasoft User’s Guide and Reference Manual, 1996) to enable both
data encoding and data decoding:
Data encoding: to convert user-application data to neural input data.
Data decoding: to convert neural output data back to user-application data.
ANNs can be considered as part of the larger group of computer based
techniques referred to as Knowledge Based Systems.
Knowledge Based Systems:
There are numerous forms of computer systems which fall under the general
heading of Knowledge Based Systems (KBS). This use of computing power can be
defined as:
page 22
23. Bryan Mills 1997 Chapter 3 -Fundamental Concepts
“a system within which data is analysed by comparison with sets of pre-obtained
data by following specific rules and/or weighted relationships” (author)
To facilitate this comparison the system will require a set (library, files, historic
records) of knowledge. This knowledge is the basis upon which the system operates
and can take numerous forms:
• Financial Data - credit limit, accounting ratios, past sales figures
• Human Resource Data - qualifications, age, experience (years)
• Operational Data - machine failures (frequency), tolerances, re-order levels
As can be seen from the above examples, the knowledge base is often a form
of database of the sort now commonly found within most organisations. The
difference between KBS and conventional databases is the level of interrogation and
control which is placed within the systems remit. As opposed to merely storing data
the system will be called upon to ‘trawl’ through the data to identify trends and
patterns of behaviour or it may use its knowledge to instigate some form of action.
For example if a bill became overdue the system could issue a reminder without the
need for an operator to intervene. This is possible because the system knows the date,
the date the last payment was made, the difference between this date and today’s and
the company’s policy on ‘debtor days’. This example also indicates the level of
understanding possible - knowledge, in this instance, can in no way be said to be in
the same sense as a human would know what it was to have an overdue bill.
It becomes apparent that many modern databases are capable of achieving
similar results to knowledge based systems. The difference between the conventional 9
knowledge based system and databases is becoming increasingly subtle and is more
9 Conventional as opposed to ANNs
page 23
24. an emphases on use as opposed to structure. Most data bases (Microsoft’s Access for
example) are capable of interrogating data and also of issuing notification should this
be required.
The difference between Artificial Neural Networks and Conventional Knowledge
Based Systems:
As previously discussed ANNs are part of the broad heading of KBS,
however it is important to recognise that there are fundamental differences between
ANNs and other KBSs. Whilst a KBS has the rules and relationships concerning its
knowledge programmed into the system (albeit kept separate from the knowledge)
ANNs develop their ‘own’ rules and relationships through a process of self learning.
The self-learning abilities of ANNs are most simply explained by example:
Suppose the relationship between the following set of data was desired:
Advertising 100 150 50 10 200
Spend £’s
Sales £’s 300 450 150 30 600
Table 1 - Sample Problem
From the above table, by dividing sales by advertising spend (or by drawing a graph),
it is quite possible to estimate that sales are three times advertising spend. It is
possible to estimate this figure because, a) we appreciate and could prove a
relationship between the variables, b) there are relatively few variables which c)
enables a simplistic approach to the formulation of a equation (relationship). It can
be appreciated that a more complex relationship may exist, which is beyond the
simplistic approach used so far. To solve a multivariable and non-linear10
relationship would require the use of statistical techniques which are often
10 Non-linear - a relationship which would create a the graph of a curve as opposed to a straight line, the equation of which
would contain powers x2etc..
page 24
25. Bryan Mills 1997 Chapter 3 -Fundamental Concepts
complicated and/or rely on a degree of approximation.
ANNs take a different route to establishing the relationship between variables
- by adjusting the values of numerical weights within a equation (function). The
weights will act upon the data to alter its value with the intention of producing the
desired result. To enable this process to take place the system must be exposed to the
data a set at a time (e.g. Advertising Spend of £100 and sales of £300 is the first set
of data in Table 1). The computer will, in the first instance, apply a guess as to the
value of the weights to be used (although this starting value may well be pre-
programmed or random (Hopgood, 1993)). This ‘guess’ will, inevitably, prove to be
wrong and the system will alter the weights and retry.
The first set of data will be treated as below:
Advertising Spend £’s Weight Function Result Desired Result £’s
100 1 Spend * Weight 100 300
2 Spend * Weight 200 300
3 Spend * Weight 300 300
Table 2 - Simplified weight method
It can be seen from the above that after a series of iterative steps the system was able
to produce the desired result, and in our previous example this weight would be
acceptable for all of the data sets.
The function used in the above example is linear as opposed to the non-linear
functions used within ANNs, also the number and relationship of the variables is
more simplistic than would normally be encountered (for an example of the more
complicated OR problem see Appendix 1).
It is possible to imagine that if the relationship was more complicated and our
weight of 3 proved unsuitable for the next data set it could be adjusted again and then
page 25
26. re-used on both sets until a satisfactory relationship was obtained. Most ANNs have
an adjustable degree of tolerance (between ANN output and training set’s expected
result), for example WinNN has adjustable target error to determine the acceptable
Root Mean Square error11, once target and RMS match training of that net is said to
be complete - note; the lower the acceptable error the more refined, and less
generalised the net becomes.
The procedure described in this simplified model could be said to represent a
single neuron (processing unit, cell). To enable more complicated relationships to be
developed ANNs have more than one neuron and it is not uncommon for the results
of one neuron to be the input of another. If these connections were viewed pictorially
they would form a network of interconnected neurons, and hence; Artificial (non
human) Neural (processing units) Network (interconnection of neurons).
Explanation of the operation of ANNs:
First Principles:
Knowledge Based Systems:
As discussed in the introduction, ANNs are a form of software that has the
ability to self learn. Unlike more conventional (rule based) forms of knowledge-
based systems the algorithms used to enable the inference engine (rule interpreter) to
work are not hard programmed or explicit rules based along the IF...THEN...ELSE
pattern. Instead the program uses a series of mathematical weights to establish data
relationships. To enable an understanding of the difference it is first necessary to
explain the basic components within knowledge based systems.
Knowledge based systems contain 3 core components. An interface with the
11 RMS - the square root of the mean of a set of squared numbers
page 26
27. Bryan Mills 1997 Chapter 3 -Fundamental Concepts
user (outside world) to enable both data input (keyboard, sensors, etc.) and output
(monitor, servos, printout, etc.), a knowledge base (data base) and an inference
engine (rule interpreter, instructions, ‘main program’). There are two other
components often found within knowledge based systems; an explanation module12 to
enable the reasoning behind the decision made to be shown, and a knowledge
acquisition module to enable the knowledge base to be built by use of one or more of
the acquisition techniques possible (Hopgood, 1993). Diagram 3 illustrates the
relationship between these components:
Diagram 3 - Knowledge Based System
As shown in diagram 3 the relationship between the components within a KBS is
relatively straightforward. Information is gathered from the outside world, stored
within a data base and, upon a query being made, accessed to provide an answer.
Rule Based System:
A rule based system is based, fundamentally, on the IF...THEN...ELSE
structure (propositional logic/calculus). The following illustrates this point:
IF credit level is greater than pre-agreed limit
THEN stop credit and issue reminder
ELSE do nothing
Where the credit level is computed from inputs and the pre-agreed limit is contained
12 ANN have great difficulty in satisfying this requirement and your attention is drawn to the discussion in Chapter 4
page 27
28. within the knowledge base. It is both common and desirable that the information
required to process the rule is contained explicitly within the knowledge base as
opposed to implicitly within the program to enable a more simplistic and robust
method of updating to be used (e.g. as opposed to altering the program’s source code
entries in a data base are changed)(Hopgood, 1993).
Whilst it is can be appreciated that this is a simplistic view of the workings of
a rule based system further developments serve only to improve and compound this
basic methodology (see for example; Appendix 2 - Bayesian Updating).
Artificial Neural Networks:
Overview:
The key difference between ANNs and KBS lies with the inference engine.
As opposed to having a logic imposed on it, the network is allowed to develop its
own logic by means of training, either supervised or unsupervised. Weights are used
to determine the strength of relationships and there is no IF...THEN...ELSE. Instead
the network decides the relevance of inputs and their interconnections based on its
own experience (e.g. it has been trained).
The network consists of a selection of nodes or cells arranged structurally in a
predetermined topology. The nodes are grouped in layers. This takes the form of an
input layer, one or more hidden layers and an output layer. Each node accepts
various inputs, adjusts them via weights, adds all inputs together them, uses them to
calculate a non-linear function, outputs them for passing to another cell, or if last cell
uses the output layer to compare the result with the expected answer and then passes
the difference back through the network to allow weight adjustment to correct errors
(backpropagation). A simplified single neuron calculation appears thus:
page 28
29. Bryan Mills 1997 Chapter 3 -Fundamental Concepts
Diagram 4 - Single Neuron Calculation
Pictorially this can be represented thus:
Diagram 5 - Representation of a Neuron
This processes is explained, in detail, below and would normally be performed by
numerous neurons/cells/nodes within one or more layers at the same time e.g. in
parallel.
Components:
It is important to appreciate that ANNs gain their ability not from a
predetermined layout or selection of weights but from the networks ability to adjust
weights and alter (strengthen/weaken) connections between nodes. Before
attempting to explain the mathematics behind these interconnections an explanation
of the key components of the network is required.
page 29
30. Nodes:
Medsker, Turban and Trippi (1996) comment that most commercial ANNs
have between 10 and 1,000 nodes arranged in three layers, and that although 4,5 or
more layers is not unheard of, it is not deemed necessary for business applications
Hopgood (1993) describes a node’s role as “to sum each of its inputs, subtract
a bias term, θ, and pass the result through a non-linear function, fnon-linear, known as
the activation function”. Hopgood’s emphasis on the bias term is discussed below.
ANNs have sets of these calculating functions and a description is given by Patterson
(1996) as “Every ANN is composed of a set n of simple neural computing elements
(neurons, units, processing elements or PEs, cells)” and where this set of cells can be
given as:
C={ci } i=1,2,...,n.
Patterson goes on to comment that cells can be grouped into three distinct categories;
input, hidden (or interior) and output.
The interior layer of cells are the nodes which perform the majority of the
calculation process and are discussed under various headings below (Weights and
Bias Terms, Generalisation, Choice of Function). Input cells are the cells which take
the initial input of stimuli (discrete keyed values or continuous sensor data) whilst
output cells enable the display of results or the control of effectors. The inputs and
outputs are usually represented by the vector x of n dimension and the output y of m
dimensions (simply put; x1, x2,...,xn and y1, y2,...ym).
.
The input data often takes the form of a text file in PC based neural nets:
page 30
31. Bryan Mills 1997 Chapter 3 -Fundamental Concepts
Diagram 6 - Screen dump of a text file for use in WinNN
The above input file demonstrates the relatively simplified form of data which
may be used in ANN training and operation. The above example does not feature
scaling of the variables as this is not required in this instance, however it does
provide a representation of the form input files often take. The file represents 4
training sets, each with 2 inputs and 1 output (4,2,1). In the first training set (case) x1
would be 0, x2 would be 0, and the expected result (y1) would be 0. This would be
followed by the second set which would be 0,1,1 respectively and the third etc. This
data represents the commonly used XOR example/problem and gives the result 1 for
an even number of inputs and 0 for an odd (Patterson, 1996). The trained network
could be used to solve simple yes/no problems for example:
Account Purchased Arrange Reason
Customer? Over 200 visit by
Units? sales staff
n n n Probably not trade customer
n y y Offer trade account
y n y Try to increase sales to trade customer
y y n Credit limit probably reached
Table 5 - Input file explanation
The above example is highly simplified. It does, however, represent the style
of business control system which uses yes/no responses. It is important to note that
the reason for the decision would not be given by the ANN.
page 31
32. Weights and bias terms:
Once the data is entered into the network its connection from input layer to
calculation node is used to facilitate the addition of weights. Patterson (1996) uses
the following notation:
net=x1w1+x2w2+x3w3=∑xiwi
where x is input variable and w is weight. Equation 1
Hopgood (1996) makes the point of subtracting a bias weight to give:
net=∑xiwi-θ Equation 2
whereas Patterson (1996) prefers the use of a bias fixed value of 1*w 0 on one of the
input links where w0=-θ. The use of either method is considered acceptable.
The weights remain independent of the variables (x) so as to facilitate their
adjustment during backpropagation. It is helpful (Patterson, 1996) to view the
relationship between the weights, class membership and the bias value in terms of a
two-dimensional plot. In more complicated example the weight value vector (wi)
would define a hyperplane in n-space where n is equal to i- the number of variables.
In this example n=2 and so it is two-dimensional.
Diagram 7 - Class Membership
page 32
33. Bryan Mills 1997 Chapter 3 -Fundamental Concepts
The significant points in Diagram 7 are the offset - giving the value of the
bias weight (w0)and the slope of the line which is given by - w1/w2. Thus the
formation of the line is derived entirely from the weights and future x values will be
shown as either belong to the class or not. Patterson (1996) places particular
importance on this boundary line as he identifies it as the key to the net’s autonomy
through its ability to alter weights and so define what is within the set and what is
outside it. The example shown is linearly separable in that its boundary is define by a
straight line/plane. This is largely due to the simplicity of the example (2-
dimensional) and partly due to the fact that it would be intended for use in a single
layer network. As an ability to cope with non-linearity is one of the key features of
ANNs they are, of course, capable of dealing with more complicated examples.
Generalisation:
To deal with n-dimensions and non-linearity ANNs generalise. Patterson
(1996) discusses generalisation in terms of “describing the whole from some of the
parts” and points out that the alternative to an ability to generalise is knowing
everything. It is possible to identify an object by knowing some general rules
involving that class of object without knowing every member of that class. For
example a metal frame with two wheels, a set of handlebars, a saddle and fitting
various size requirements is probably a bicycle. It is not necessary to memorise
every manufacturer’s catalogue.
ANNs generalise by creating a class which exists in weight space with its
boundary given by the mapping function F (Patterson, 1996). Mapping functions are
either autoassociative or heteroassociative meaning they map the original pattern
from noisy/incomplete data or map input patterns to different output patterns
page 33
34. respectively. Mapping functions are shown mathematically as F:x→y. The non-
linear boundary can be shown by the simplified diagram:
Diagram 8 - Universe of objects
From Diagram 8 it is possible to see that the boundary established by the
network includes both the training set data and other instances of the data not given
in the training set but which would be encountered if more sets of data were made
available - therefore giving the network an element of flexibility. The boundary must
therefore include all examples of the training set, all examples of data corresponding
to the nets function but not known at time of training, and exclude all other data sets.
Once this has been achieved the generalisation can be said to have been a complete
success. It is apparent that the method of training and the selection of data will have
particular importance on the accuracy of this process.
Choice of mapping or activator function:
As mentioned in ANN Overview (above) the summed weights are passed
through a non-linear function before proceeding to the next cell or output layer. This
is the mapping function referred to above so that F:x→y, and is known as the
activator function (or activation level/summation function - Medsker, Turban and
page 34
35. Bryan Mills 1997 Chapter 3 -Fundamental Concepts
Trippi, 1996). It is suggested by Patterson (1996) that the choice of activator should
be a “monotonic nondecreasing function of net”. This simply means that the
function should hold true for all facts, even if it was originally based on only a
sample (monotonic) and that the slope of the function should rise from left to right (it
should not cause values to diminish in relation to other lower values; x=2 y=0.88,
x=3 y=0.95 for a sigmoid value). Hopgood (1996) makes the point of stating that
“The weights and biases can be learnt, and the learning behaviour of a net depends
on the chosen algorithm”. It is further stated that the sigmoid function is most
commonly used and Patterson (1996) concurs with this statement. The sigmoid
function is given as:
1
fnonlinear ( x ) =
1 + e−x Equation 3
and would appear graphically thus:
Sigmoid Function
1
0.8
0.6
0.4
0.2
0
1
3
5
-5
-3
-1
Diagram 9 - Sigmoid Function
The above diagram (Diagram 9) shows some of the key features of the sigmoid
function and thus its reasons for use, these features include:
• The ability to make all values positive .
• The relatively fine level of discrimination (the slope)
• The fact that all results are given as between 0 and 1.
Data pre-processing:
It can be appreciated that the data under investigation may take various forms.
The ANN will require inputs which are of a numeric nature. This does not prevent
page 35
36. non-numeric data being analysed provided it can be converted, with consistency, into
numbers. For example, risk is a common business concept which is regularly
translated from the vague - safe, moderately safe, risky, very risky - to a range of
probabilities (say 100%, 75%, 50%, 25% probability of a favourable event
occurring).
Once the data has been gathered, and given a numeric value if required, the
efficiency and accuracy of the ANN can be enhanced by information pre-processing.
Wasserman (1989) and Patterson (1996) both concentrate on normalisation, which is
a common form of pre-processing. Normalisation is a method by which all the data
being processed can be given a common minimum and maximum value. For
example, readers familiar with statistics may draw a parallel between the
normalisation of the data with techniques used in statistics to determine probabilities
using the normal distribution curve(NDC). Here any distribution of data can be
mapped (converted) to the NDC which has its probabilities pre-calculated.
The most common form of normalisation will see all the data converted to
values between 0 and 1. This has the advantage of both reducing the difficulty of
manipulating large numbers (simply put it is easier to manipulate, say, 0.2 than
2,000,000) and enhancing the networks ability to adjust weights by reducing
unnecessary emphasis (for example in loan calculations interest rates may be given as
% or decimals, loan size in millions). Certain activator functions will restrict output
to between 1 and 0, regardless of input (Medsker, Turban and Trippi, 1996). For
example the sigmoid function mentioned above:
Input 1/1+e-x
0.5 0.6225
5 0.9933
17 0.9999999586
35 0.99999999
35.2 1
page 36
37. Bryan Mills 1997 Chapter 3 -Fundamental Concepts
2256 1
-5 0.0067
Table 4 - Sigmoid values
In can be seen from the above table that data which is beyond a certain range
approaches a value of one (exact point at which it appears as one is dependent on the
number of decimal places used and rounding). As the data is multiplied by weights
and summed before entering the activator function it can be appreciated that no
accuracy is gained by maintaining a mixture of large and small numbers (as it will
convert everything to between 0 and 1 regardless). It can also be recognised that the
output from the net, having passed through various activator functions, will be of
decimal or Boolean form (1 or 0). It is important to recognise that as the results for
the sigmoid function return 0 and 1 for a large range of negative and positive
numbers, respectively, it is advisable to restrict the inputs and outputs to values
between 0.1 and 0.9 (Tucker, 1996). This also avoids the use of either 1 which has
the effect of distorting the part of the sigmoid function (e-1 = 1/e ) or 0 which distorts
sigmoid (e0 = 1) and has the effect of negating weights (x1w1 = 0 for x = 0).
There are numerous methods of data pre-processing and Tucker (1996)
contains a accessible description of six common methods. These are given as
Distribution truncation and squashing functions, Natural log regression, Ratio
splitting, Positive/negative split, Variable pre-selection, and Data squashing.
These can be explain, briefly as:
Technique Description
Distribution truncation The removal of outliers, the removal of unusually large,
and squashing or small, numbers from the data set
functions
page 37
38. Natural log regression Using logarithms to convert data into small units.
Ratio splitting Not inputting the results of a ratio but the numerator (top)
and denominator (bottom) values separately.
Positive/negative split Separating a variable which has both positive and
negative examples into two separate variables (e.g. Profit
becomes Profit or Loss as opposed Profit £200 or Profit -
£150 (loss))
Variable pre-selection Simply manually deciding which variable to include and
which to leave out.
Data squashing Converting all data so as it is within a pre-defined range
(0.1 to 0.9 being given as most appropriate).
Adapted from - Tucker (1996)
Table 5 - Data pre-processing
It is also important to emphasise that the data (variables) used within a ANN
should have some form of theoretical basis for a relationship before they are
included13. There is little point comparing interest rates, inflation, project life, level
of risk and outside temperature if the problem under consideration was to determine
the ‘cost of capital’ to use in net present value (NPV) calculations .
Training:
By their very nature, ANNs require training. This training can be viewed in a
similar manner to human training in that the activity is repeated until the system
produces a satisfactory result (or it is decided to abandon the training run). Hopgood
(1993) suggests that the training process is, more correctly, an “error reduction”
process. Alternatively Patterson (1996) suggests that it is “adaptive learning in a
dynamic environment” emphasising the flexibility of the method.
There are, however, three separate methods of training - Supervised,
reinforced and unsupervised. Wasserman (1989) argues that unsupervised training is
13 The appendix of Farrar, Tucker and Bugmann (1997) provides an example of how this could be approached.
page 38
39. Bryan Mills 1997 Chapter 3 -Fundamental Concepts
the only “biologically plausible” method of training ANNs. The desire to be
“biologically plausible” suggests Wasserman is more concerned with the theoretical
study and attempted replication of a brain, than practical application of a
mathematical technique. Plausibility should be weighed against ‘usefulness as a tool’
when ANNs are used in applications not related to the study of
psychology/neurology. The use of unsupervised learning will be discussed under the
heading of Topology below (Kohonen being one of its originators). Supervised and
reinforced learning methods are broadly similar in philosophy and so an explanation
of supervised training will be made first.
In supervised training the results obtained from the cells are compared with
the desired results (contained within the original input). The difference is referred to
as an error and the weights contained within the network are adjusted
(backpropagation). This adjustment continues until the sum of the squares of the
differences (errors) are minimised (in a way similar to linear regression - line of best
fit calculations).
This process can be simplified thus:
• Subtract output from target contained within input vector.
• Square the difference to remove negative signs14.
• Add all the squared differences together.
• Compare this answer with the desired level of error.
• If error unacceptable adjust weights and begin again.
The minimisation is said to be complete when the error has reached an acceptable
level.
Reinforced training follows a broadly similar route to supervised but, as
14 Removal of negatives - else error of -100 plus error of 100 would indicate no error.
page 39
40. opposed to calculating the level of error, the ANN is merely informed that it is either
wrong or right and continues to adjust weights until a correct result is identified.
Patterson (1996) suggest that the method is seldom used in practice and attention
should instead be paid to supervised and unsupervised learning (other authors make
little or no mention of reinforced training).
It should be noted that it is possible to over-train a network. To continue the
human analogy, this would represent an employee trained, in a vocational way, to
such a level that they were only able to perform their present role, and none other.
This may occur, for example, in a factory where an operative has been doing the
same repetitive job for such a long period of time their ability to transfer any of the
skills they have learnt becomes hampered.
Topology:
The Multilayer Perceptron and Kohonen’s Self Organising Net will be used to
give a more detailed explanation of ANN construction.
The Multilayer Perceptron - an example of supervised learning/training:
The multilayer perceptron is most commonly used for non-linear estimation
or classification. It is often referred to as a feedforward network15. This net
comprises of the conventional input and output layers, with a programmer (user)
defined number of nodes and layers between (hidden). The number of nodes used in
the input/output layers is data specific. A diagram representing the multilayer
perceptron is provided below:
15 Generic name used to refer to this network in particular - any network in which the data flows towards the output is
technically feedforward.
page 40
41. Bryan Mills 1997 Chapter 3 -Fundamental Concepts
Diagram 10 - The Multilayer Perceptron
It is usual for the number of input nodes to equal the number of variables
under consideration, likewise the number of output nodes will equal the number of
desired outputs. The number of nodes are often used to give the network its name.
For example the network above may be termed a 3-4-4-2 network as it has 3 input
nodes, two sets of 4 hidden or computational nodes and 2 output nodes and would be
described as a four layer network (although Hopgood (1993) indicates that there is
argument concerning this issue as some claim the input layer should not be counted).
It can be seen from diagram 10 that each of the calculation nodes is
connected to all four of the next set of nodes, but that no nodes are connected
vertically (with reference to the diagrams rotation only). The weights would be
adjusted at (on/during) the connections between the nodes, with each node being a
non-linear function (activator function). The process could be described as - input,
weight adjustment, conversion via activator function, become input for next layer,
weight adjustment, conversion via activator function and output to final layer where
error checking will occur and instigate backpropagation to adjust weights (note- this
method utilises supervised training).
The number of both layers and calculation nodes is dependent on the
page 41
42. programmers decision given a certain problem. It is common to start at a low number
of layers/nodes and then increase this number until the desired level of
accuracy/speed is achieved.
The Kohonen self organising net- an example of unsupervised learning/training:
As mentioned previously, the Kohonen self organising net is a form of net
which learns using an unsupervised method. This type of network is most commonly
used for pattern recognition and is often referred to as The Self Organising Feature
Map (Patterson, 1996). As well as differing from the Multilayer Perceptron (MLP)
in training method and application it also differs in that it is a single layer
feedforward network as opposed to multilayer. In addition the network works on a
principle of “winner takes all” (Wasserman, 1989). This means that as the
information is processed within the layer, one and only one, node will transmit an
output. This is why the method is often referred to as a “competitive one “(Patterson,
1996 and Wasserman, 1989).
Diagram 11 - Kohonen Self Organising Feature Map
Unlike the Multilayer Perceptron the Kohonen net learns in an unsupervised
manner. As the net is attempting to replicate a pattern in various sets of data it trains
by continually processing different data-sets until it is satisfied that each new run
page 42
43. Bryan Mills 1997 Chapter 3 -Fundamental Concepts
will create a replica of previous runs (within tolerances pre-set). For example a child
is not always told the same word repeatedly until he or she learns to say it (by being
told when he or she has said it incorrectly) but rather learns by being exposed to
numerous examples of speech and establishes the ability to replicate these patterns
independently. This learning style often sometimes referred to a competitive filter
associative memory (Medsker, Turban and Trippi, 1996).
It can be appreciated that the differing topologies are related to the different
applications to which the networks are applied. Classification (MLP) and pattern
recognition (Kohonen) are two quite different problems. An example of
classification may be given as “does this data correspond16 to a customer with good
credit ratings”, whereas pattern recognition is more commonly associated with
speech or hand writing recognition or recognition of trends or patterns within
financial information (which may return us to the credit problem by a different route).
In summary it can be said that the Multilayer Perceptron works by co-
operation and the Kohonen network by competition between nodes.
Summary:
ANNs offer an alternative to conventional forms of knowledge based systems.
Whilst ANNs drew their original inspiration from studies of the human mind further
developments in this area are limited by technology. The use of ANNs as a
mathematical tool is, however, both possible and practical at today’s levels of
technology.
The use of self learning and pattern recognition provides a solution to
problems which, using conventional techniques, may have been overcomplicated or
simple not possible. The basic concept of learning by example through the
16 Is it a member of the set of customers with ....
page 43
44. adjustment of mathematical weights is a reasonable approximation of the process and
allows the internal computations and structure of the network to be treated as a
‘black-box’17.
Networks topologies and learning/training methods are problem specific and
it is should be appreciated that the correct choice of network, training style, training
data, pre-processing method and activator function all contribute towards the
successful application of the network.
Diagram 12 (overleaf) represents, in the form of a flow diagram, the series of
discrete steps which make up ANN operations.
17 Black-box - The exact details of the internal workings need not be known to facilitate use.
page 44
45. Diagram 12 - The operation of ANNs - flow diagram
46. Chapter 4 - Investigation into advantages,
disadvantages and current application of ANNs.
Introduction:
In order to allow a more complete consideration of the suitability of ANNs for
a given decision making problem, it is necessary to appreciate the advantages and
disadvantages that the use of ANNs provides. The self-learning pattern recognition
ability which gives the ANN its distinct characteristics, also creates disadvantages,
some of which are problem-specific, whilst others are universal.
ANNs are currently used for a wide variety of decision support problems.
Perhaps the most commonly cited problem is that of distress modelling (prediction of
bankruptcy), but examples are also found in such diverse areas as production control,
new product development and traffic light sequence (road junction) modelling.
Advantages and disadvantages:
Numerous authors have commented on the advantages and disadvantages of
ANNs and the following, cited from Hammerstrom (1993), provides what could be
regarded as fundamental points of interest.
Advantages:
• They can infer subtle, unknown relationships from the data.
• They are non-linear so that complex problems can be solved more
accurately than by linear techniques.
• They are highly parallel, which makes them run faster on computers with
parallel processors than alternative methods.
Additional benefits are given by Medsker, Turban and Trippi (1996) as:
47. Bryan Mills 1997 Chapter 4 - Advantages and Disadvantages
• The ability to cope with highly correlated input data (also Multicollinearity
Tucker, 1996).
• A more highly automated input interface is made possible by ANNs ability
to process all inputs at once.
• Fault tolerance - due to the high number of nodes, inaccuracy caused by
bad data can often be localised and not affect the accuracy of the ANN as a
whole.
• Generalisation - noisy, incomplete or previously unseen data will still result
in a reasonable response being made, providing the ANN is suitably trained
(Hawley, Johnson and Raina, 1990).
• Adaptability - Training can occur during the ANN’s in-service lifetime,
allowing the ANN to remain up to date.
Hawley, Johnson and Raina (1990) comment on the fact that ANNs, by the
general purpose nature of their structure, are faster to install and maintain than custom
built KBS. Training of ANNs, though time consuming, need not be as technically
difficult (and therefore as expensive) as writing the program structure of a KBS.
ANNs, due to the way in which the relationship of weights is formed, are not
prone to ‘crashing’ as a result of incomplete or inaccurate data but are often said to
degrade gracefully over time as the weight values alter. This is often cited as an
advantage, but it should be borne in mind that something which has happened
progressively over time may not be noticed until damage has occurred, and without an
adequate control process, may not be appreciated at that point either.
Disadvantages:
Hammerstrom (1993):
• They may fail to produce a satisfactory solution because of insufficient data
page . 47
48. [for training] or because no learnable (sic) function exists.
• They may produce results from a complex machine learning procedure that
has no straightforward cause and effect origin that can be easily explained.
• They can be slow and expensive to train.
• ANN’s computational speed, in the finished application, depend linearly on
the number of connections and, roughly [approximately], the square of the
number of nodes.
To this list of disadvantages should be added ANN’s most criticised fault -
• ANNs are not capable of demonstrating the logical reasoning behind the
result obtained - the black box approach (Farrar, Tucker and Bugmann,
1997, Hopgood, 1993, Medsker, Turban and Trippi, 1996 and Tucker and
Farrar, 1996).
A problem recognised by Tucker (1996) and Tucker and Farrar (1996) is the relative
‘youth’ of ANNs as a decision making technique. Conventional decision making
techniques have the advantage of many years of testing, both theoretically and in
practice, which have produced models that are both recognised and accepted.
ANNs, by the very fact that they are relatively new and underdeveloped, do
not have the weight of past experience to promote their results. However, as
development within the field is progressing, refining of the technique and empirical
evidence should produce an improved methodology and increased statistical evidence
of accuracy.
Current application of ANNs:
Hawley, Johnson and Raina (1990) provide a comprehensive discussion of
ANN applications in finance from which the following is adapted:
Corporate Finance Applications:
49. Bryan Mills 1997 Chapter 4 - Advantages and Disadvantages
Financial Simulation: Whilst the financial management tasks of a company can be
divided into various smaller and more manageable segments, the complexity of the
company’s internal and external environment is often misrepresented in these
simplified models. ANNs provide a means of linking all segments together during
analysis, they can be tailored to an individual company and are capable of being
dynamic and responsive to change (Donaldson, 1996).
ANNs can be used for credit customer behaviour modelling, planning bad
debt expenses, planning the cyclical expansion and contraction of accounts,
evaluating credit terms and limits, cash management, evaluation of capital
investments, asset and personal risk management (insurance), exchange risk
management, and the prediction of credit cost and availability based on a company’s
financial data. Hawley et al (1990) claim that this area of ANN application offers,
perhaps, the greatest potential for ANN business application.
Prediction: In determining a new policy, direction or product, organisations need to
determine the reaction this choice will create in both present and future investors and
the subsequent effect this will have on their investment decisions. Whilst
conventional decision making techniques are often more cost efficient at solving
problems which have well-identified theoretical underpinning, the problem of investor
behaviour and sentiment is of a complexity more suited to ANNs.
Investors often base their decisions on a wide range of issues and information
concerning both the company and the broader economic environment. It is possible to
train ANNs to mimic the behaviour of investors (using actual investors as training
models) and then determine the effect alterations to company policy and financial
position has on their investment decisions. ANNs offer the opportunity to incorporate
a wide range of input and output information enabling the decision maker to gauge
reaction to change in ways other than alterations in stock price alone (Hawley et al,
page . 49
50. 1990).
Evaluation: Accurately valuing a target company’s net worth before attempting a
acquisition increases the probability of success, both in terms of acquisition outcome
and with regards to eventual profit. ANNs are trained, in this instance, by exposure to
training sets of target company data, as input, and human expert value estimate as
response. This use of ANNs seeks to copy the behaviour of individuals, including the
incorporation of human “hunches” and intuition, which would make the use of
conventional decision support programming difficult or impossible.
ANNs have been used successfully in a wide range of evaluation problems and
confer advantages including: screening large numbers of companies for potential
undervalue or other form of acquisition attractiveness to minimise decision makers
time (which then only need look at “ideal” companies); the ability to copy the
interpretations of a wide range of decision makers; and the ability to automatically
adjust to the decision maker’s changing analytical procedures and selection criteria
over time (Hawley et al, 1990).
Credit Approval: Using a similar training approach to the company evaluation
method, detailed above, ANNs are capable of reducing time and labour by mimicking
the decisions of financial staff in both credit approval and credit limit decisions. In
addition ANNs are able to interpret a wider range of financial statements (providing
they are trained on a wide range) more quickly than their human counterparts,
negating the need for the information to be restated in a standardised form (Jenson,
1992 and Marose, 1990).
Financial Institutions Applications:
Assessing Lending/Bankruptcy Risk: ANNs can provide expert opinion on loans and
lending arrangements to financial institutions in a similar manner to the example of
credit approval discussed above.
51. Bryan Mills 1997 Chapter 4 - Advantages and Disadvantages
Security/Asset Portfolio Management: Due to the unstructured nature of the portfolio
manager’s decision making process and the diversity of information involved ANNs
offer advantages over conventional decision making techniques (Hawley et al, 1990).
Pricing Initial Public Offerings (of ordinary shares): Determining the issue price of
ordinary shares is a complicated process but one which it is essential to optimise
(Brett, 1991). The information is often diverse and of a non-standard format and so
the application of ANNs confers advantages, not found in conventional decision
making techniques, through their ability to generalise and their lack of reliance on an
explicit rule base.
Professional Investors Applications:
Identification of Arbitrage Opportunities: By replicating an expert decision maker’s
reasoning process, a process he or she may not be able to articulate, the ANN is able
to assist in the identification of companies which are about to becoming victims of a
hostile take-over (and thus allowing purchasing of the company’s shares to be
initiated). ANNs offer advantages in their ability to screen large numbers of potential
targets, thus giving the arbitrageur a smaller workload.
The Technical Analyst18: ANNs pattern recognising abilities enable the patterns
(hitherto un-calculable) within stock markets to be emulated. Through this evaluating
ability, more accurate predictions of share price movements can be derived
(Davidson, 1996).
The Fundamental Analyst: Industry norm patterns, market conditions and financial
statements can be used to train ANNs to assist in share purchasing in a way similar to
the technical analyst model.
Summary:
The advantage which ANNs confer over and above more traditional decision
18 Influences on share price which are not related to company trading position.
page . 51
52. making/support techniques are their ability to discern patterns in large volumes of
data through a process of self-learning as opposed to explicit instruction. This
process enables ANNs to discover patterns or relationships which may have been
overlooked or given too great an emphasis in existing decision support mathematics.
ANN’s ability to identify patterns also enables network recognition of variations in
handwriting, voice, and image recognition, and provides opportunities for a wide
range of security applications.
Currently the use of ANNs in business is predominately within bankruptcy
prediction and financial risk assessment. However ANNs have been used
successfully in a wide range of operations management, marketing (data mining) and
personnel applications.
The use of ANNs has been shown to offer increased accuracy (Farrar, Tucker
and Bugmann, 1997) and, in many instances, are one of the only methods currently
available (e.g. handwriting and voice recognition). The ability to deal more easily
than conventional methods with non-linearity (Waldrop, 1992) gives the user an
advantage in the highly non-linear markets in which business operates (Cuthbertson
and Gripaios, 1996).
ANNs “operate by a logic known only to themselves” (The Economist, 1995).
The most difficult obstacle to overcome in the promotion of ANNs as a decision
making tool is their lack of interoperability.
53. Bryan Mills 1997 Chapter 5 - Schema
Chapter 5 - Schema for the assessment of the
suitability of ANNs for given problem
Introduction:
Whilst ANNs are an extremely powerful tool, their application is not suited to
every problem. For reasons such as cost, unavailability of data and form of data there
are certain problems which are bettered suited to other forms of decision support. In
order to maximise the benefit gained from ANN application a process of
problem/method pre-selection is required.
To facilitate the matching of problem19 and solution, a schema has been
developed. By answering a series of relatively simple questions, the user is able to
determine whether ANNs are suitable for the problem they are attempting to solve. In
addition to this both the reasons for and against the use of ANNs, for a given problem,
are discussed. Additionally, a suggestion as to other methods which may prove more
suitable should ANNs not be appropriate is given.
Schema:
Diagram 13 (page 46) represents the schema in the form of a flow-chart. By
following the series of questions, labelled 1 to 10, the user is able to determine
whether ANNs are suitable for a given problem and, any problems which may be
encountered in their application. The dotted lines and boxes represent complementary
advice, the solid lines and boxes represent questions, flow and conclusions. In
conjunction to the process the flow-chart is also described, fully, in text form.
The schema is also available as a computer program, the instructions for use
are contained in Appendix 3, and the code lists are contained in Appendix 4.
19 Schema concentrates on the use of ANNs for decision making purposes, full systems for factory control etc. can cost in the
region of £25,000 and would require more detailed analysis than is possible within a general purpose flowchart (Horridge, 1997)
page . 53
54. This program enables an approach which is more dynamic, multidimensional
and, above all, simplistic for the user, than is possible on paper alone. The program
(called Net Solver) enables the user to determine whether ANNs are suitable for a
given problem. To make this possible the program records the response made by the
user to a series of questions. These responses enable the program to calculate the
suitability of the problem/decision to ANN use. The result is displayed as both a
percentage and as a ‘progress-bar’ of the sort used within the Windows 20 environment
to show elapsed time. In addition to this result the responses are reiterated, to allow
the user to check that she/he has not made any errors. If the user had entered a project
name this will also be displayed. To enable the user to determine the next step in the
application of ANNs advice is given where it is thought appropriate. The user then
has the option of printing the results, advice and details (see Appendix 5 - Sample
Output).
Throughout the program a fictitious company and telephone number is
mentioned (ABC ANNs on (0110)111222), at points where the user may need
additional advice. It is intended that this program could, with further development, be
used in one of the following ways:
• Distributed free of charge (e.g. via the Internet or computer magazine disks) by
a software/consultancy company, replacing ABC ANNs with its own name and
contact number as part of an advertising campaign.
• Incorporated into ANN software as an introduction.
• Sold as consultancy software.
• Used within education as a teaching aid.
• Distributed free of charge, via the Internet, as a philanthropic act.
The program was written using Microsoft Visual Basic Version 4 Professional
(VB4), using the Microsoft Windows 95 operating environment, and a PC equipped
with a 486/66 processor, 8 Mb of Ram and 200 Mb of spare hard disk space (see
20 Microsoft, Windows and Visual Basic are all registered trademarks of the Microsoft Corporation
page . 54