Business Dissertation Thesis

The Application of Artificial Neural Networks to Business
Problems

Bryan Mills

University of Plymouth Business School
(Franchised to Cornwall College)

Honours project submitted as partial fulfilment for the degree of
BA Honours in Business Administration

Supervisor: Dr Jon Tucker

14th May 1997

Acknowledgements:
Firstly I would like to thank my supervisor Dr Tucker for his patience and generosity
and, in addition, to acknowledge the contribution he has made to this dissertation. I would
also like to thank Dave Ager, Jill Ferret and Mike Trennary for their tolerance and
encouragement during both the dissertation and the degree programme. I would like to
acknowledge the encouragement I have received throughout the degree from Buzz Banks,
Helen Cobbin and Ken Waller. Also, I would like to take the opportunity to thank Paul
Ingram for his uncompromising and contagious obsession with academia and Tony Butt for
first introducing me to Chaos Theory and non-linearity.

2

Bryan Mills 1997

Abstract:
Artificial Neural Networks (ANNs) provide a powerful information technology
based tool for decision making purposes. However, present literature on the subject
is often found to be either inaccessible or of limited relevance to (general) business
application. In this report ANNs are described in a more intuitive manner than found
within much of the existing literature. Emphasis is placed upon the use of ANNs
within the business environment, although the study still provides an introduction for
wider application. Misconceptions surrounding ANNs, and Artificial Intelligence in
general, are explored and recommendations are made with a view to their resolution.
The advantages and disadvantages of ANNs are discussed and present applications
are listed with a view to demonstrating the various application possibilities of ANNs.
To enable wider application of ANNs within business, and to reduce misguided
application, a schema has been developed. This schema, which has been developed
as both a flowchart and a computer program, allows the potential ANN user to
critically appraise the use of ANNs for a given decision making problem.

page 3

Contents:
(modified simple)

ACKNOWLEDGEMENTS:.......................................................................................................................

ABSTRACT:...............................................................................................................................................

LIST OF DIAGRAMS: (BUILD FROM TABLE OF FIGURES STYLE - DELETE LIST OF
TABLES FIRST).........................................................................................................................................

LIST OF TABLES: (BUILD FROM TABLES STYLE)............................................................................

GLOSSARY OF TERMS:..........................................................................................................................

CHAPTER 1 - INTRODUCTION..............................................................................................................

General Introduction...............................................................................................................................

Popular Misconceptions Concerning Neural Networks:........................................................................

Chapter 2 - Discussion of Aims, Methodology and Research Philosophy............................................

Aim:.........................................................................................................................................................

Objectives:...............................................................................................................................................

Benefit of the project to industry and commerce:...................................................................................

The growth of research in the neural area:............................................................................................

Methodology and Approach:...................................................................................................................

Schema development:..............................................................................................................................

Chapter 3 - Explanation of the Fundamental Concepts of ANNs.........................................................

Introduction:............................................................................................................................................

An outline explanation of the fundamental concepts of Artificial Neural Networks:..............................

Knowledge Based Systems:.....................................................................................................................

The difference between Artificial Neural Networks and Conventional Knowledge Based Systems:......

Explanation of the operation of ANNs:...................................................................................................

First Principles:.......................................................................................................................................
Knowledge Based Systems:................................................................................................................

4

Bryan Mills 1997

Rule Based System:..................................................................................................................................

Artificial Neural Networks:.....................................................................................................................
Overview:............................................................................................................................................
Components:........................................................................................................................................
Nodes:..................................................................................................................................................
Weights and bias terms:.......................................................................................................................
Generalisation:.....................................................................................................................................
Choice of mapping or activator function:............................................................................................
Data pre-processing:............................................................................................................................
Training:..............................................................................................................................................
Topology:............................................................................................................................................

The Multilayer Perceptron - an example of supervised learning/training:............................................

The Kohonen self organising net- an example of unsupervised learning/training:................................

Summary:.................................................................................................................................................

Chapter 4 - Investigation into advantages, disadvantages and current application of ANNs...........

Introduction:............................................................................................................................................

Advantages and disadvantages:..............................................................................................................

Current application of ANNs:.................................................................................................................

Summary:.................................................................................................................................................

Chapter 5 - Schema for the assessment of the suitability of ANNs for given problem.......................

Introduction:............................................................................................................................................

Schema:...................................................................................................................................................

Explanation of Schema:...........................................................................................................................

Summary:.................................................................................................................................................

Chapter 6 - Conclusions and Recommendations....................................................................................

Conclusion:..............................................................................................................................................

Limitations:..............................................................................................................................................

Further Research.....................................................................................................................................

Appendix 1 - Example of Training Process.............................................................................................

page 5

Appendix 2 - Bayesian Updating:............................................................................................................

Appendix 3 - Instructions for Running The Computer Program:........................................................

Appendix 4 - Computer Code-list............................................................................................................

Appendix 5 - Sample Output:...................................................................................................................

Appendix 6 - Visual Basic as a Programming Language:.....................................................................

Bibliography:.............................................................................................................................................

List of Diagrams: (build from table of figures style - delete list of tables first)
Diagram 1 - Patent Activity..................................................................................................................................................................
Diagram 2 - Methodology.....................................................................................................................................................................
Diagram 3 - Knowledge Based System.................................................................................................................................................
Diagram 4 - Single Neuron Calculation...............................................................................................................................................
Diagram 5 - Representation of a Neuron.............................................................................................................................................
Diagram 6 - Screen dump of a text file for use in WinNN....................................................................................................................
Diagram 7 - Class Membership............................................................................................................................................................
Diagram 8 - Universe of objects...........................................................................................................................................................
Diagram 9 - Sigmoid Function.............................................................................................................................................................
Diagram 10 - The Multilayer Perceptron.............................................................................................................................................
Diagram 11 - Kohonen Self Organising Feature Map.........................................................................................................................
Diagram 12 - The operation of ANNs - flow diagram..........................................................................................................................
Diagram 13 - Schema...........................................................................................................................................................................

List of Tables: (build from tables style)
Table 1 - Sample Problem...........................................................................................................................
Table 2 - Simplified weight method.............................................................................................................
Table 5 - Input file explanation...................................................................................................................
Table 4 - Sigmoid values.............................................................................................................................
Table 5 - Data pre-processing.....................................................................................................................

6

Bryan Mills 1997

Glossary of terms:
Activator function - an equation (mapping function) which describes a neuron’s

internal state as the total of its inputs; net = ∑xiwi-θ, where x is an

input w is a weight.

Algorithm - a procedure or series of steps used to solve a problem

Autoassociative - mapping the original pattern from noisy or incomplete data

Backpropagation - an algorithm which compares results with expected answers and

then passes the difference back through the network to facilitate

weight adjustment.

Bias term - A systematic error (θ) introduced to each node independently to allow

control over the otherwise independent node output.

Cell - A neuron

Database - In this instance, a set of facts (data) stored within a computer system

Dependant variable - A variable which will be altered or created by the change in

value of an independent variable(s). Normally shown on the left

hand side (LHS) of an equation..

EPOS - Electronic point of sale - the computer connection between cash-tills and the

central computer within a retail store

EPS - Earnings per share (accountancy measure)

Front-end subsystem - A computer program designed to simplify (humanise) the

input and output of data

Fuzzy - A set whose members belong to it to some degree. In contrast a standard set

contains its members either all or none (Kosko, 1993).

Function - A rule which maps one set element onto a different element in another set,

page 7

sales level could be said to be a function of demand

Generalise - The ability to identify a wide range of objects, patterns etc. from a

minimal set of key descriptive data

Heteroassociative - mapping input pattern set to different output pattern set

Hyperplane - A plot involving more than 3 dimensions and therefore difficult to

represent graphically

Independent variable - A variable which will alter or create the change in value of a

dependent variable(s). Normally shown on the right hand side (RHS)

of an equation

Inference engine - The part of a knowledge based system’s programming which

deduces results from given facts/data

Knowledge based system - The separation of data and control (algorithms) allowing

the computer to respond to a series of differing inputs by calling on a

library of information (knowledge base) as opposed to altering

variables contained explicitly within the program’s structure.

Mapping function - A rule linking the elements of one set to those of another; usually

shown as F:x→y; the function which maps the x onto y.

Multivariable - Containing a large number of independent variables

Network - A collection of interconnected nodes forming a topology

Neuron - A single activator function, a processing element, a mapping function

through which variables must pass, a calculation point

Nodes - Neurons

Non-linearity - Equations containing powers, roots, trigonometric or logarithmic

functions.

8

Bryan Mills 1997

Normalisation - A form of data pre-processing which seeks to give all inputs/outputs

a commonality by constraining their values to within a pre-

determined range

Pre-processing - Alterations to data before use (normalisation, removal of outliers,

ratio splitting). Usually conducted with the intention of increasing

the networks efficiency or conversion of non-numeric data to

numeric.

Propositional logic/calculus - A step by step inference system for determining

whether a given proposition is true or false. There are various forms

of propositional logic (modus ponens, modus tollens, denial of

antecedent etc. ), but all are based on a deviations of: If x is true then

y must be true/false, If and only if x is true then y is true/false etc.

(Eysenck and Keane, 1995).

Ratio splitting - Using the component parts of a ratio separately as opposed to using

the result (GPMargin = GP/Sales; use GP and Sales as input not GP

Margin)

Real-time - The collection and processing of data as events occur as opposed to the

use of historic data. EPOS works in real-time

ROCE - Return on capital employed (accountancy measure)

Set - A collection of elements defined by a rule which makes them separable from

other sets - e.g. men and women are two separate sets (separated by

sex) but are also within the common set of humans (separated from

other animal forms by species)

Sigmoid - A common ANN Activator function. An equation which has the effect of

reducing all independent variables to an answer of between tending

page 9

towards 1 and tending towards 0 (never reaching either 1 or 0) and is

generally given thus:

1
fnonlinear ( x ) =
1 + e−x
where x is summed input and e is the mathematical constant that is the
base of natural logarithms (2.71828.....)

Topology - In this instance an attempt to graphically represent the interconnection of

nodes within the network. Topology is often one of the key

distinguishing features separating different ANNs (others being

training method and activator function)

Training method - As ANNs self learn by exposure to data it is necessary to have an

algorithm which allows the ANN to distinguish between correct and

incorrect responses. This may either be supervised (told when

incorrect and what should have been the output), unsupervised (self

learning pattern recognition) or reinforced (told simply if correct or

incorrect)

Training set - A collection of data used to train the ANN, usually separated into a

training set and a hold out or test set

Vector - A quantity which has both magnitude and direction. ANN’s input consists

of a one dimensional array of differing x values of the form

x1w1+x2w2+x3w3+...xnwn, where x indicates input and w indicates

weight

Weights - A value which is altered by the ANN to enable the emphasis of the variable

upon which it acts to be either strengthen or weakened. A variable

coefficient which determines strength of an input’s effect on output

10

Bryan Mills 1997

Chapter 1 - Introduction
General Introduction
Business involves a complex mix of people, policy and technology, and exists

within the constraints of economics and society (Clifton and Sutcliffe, 1994). It is

often the precise way in which these items are mixed that can create either success or

failure for an organisation. This presents the manager with two key tasks; the

efficient collection and analysis of all relevant information. From this analysis the

manager will be able to formulate strategies, define objectives and implement plans

for there fruition. The provision and analysis of information, within business, is often

referred to as the decision support process and the methodology adopted referred to as

decision support systems (DSS).

Business decisions can often be viewed as the solution of various

mathematical problems. Whether it be determining the price level of a product, the

benefit of expansion into a new market, staff levels or the probability of a project

failing mathematics usually plays a role. In fact, due to the overriding objective of

“maximising shareholders wealth (McLaney, 1994)” found within all profit making

organisations, it can be said that, as wealth/profit is measured numerically, it would

be difficult, if not impossible, to view the organisation meaningfully in any other

way1.

One of the key problems in any decision is the availability and cost of

“perfect information”. Given perfect information (all the facts concerning a decision

with complete confidence in these predictions being correct) there would be little for

the business manager to decide, it would simply be a choice of the project which

1 Non-profit making organisation seek cost efficiency - another mathematical measure

page 11

maximised overall contribution2. Most decisions, however, are not based on perfect

information. This is generally due to a combination of the prohibitive cost of

gathering such information, the availability of information and the intrinsic

unpredictability and complexity of the markets in which business operates.

Ongoing developments in the field of Information Technology has enabled

the gathering, storage and retrieval of much larger quantities of information than was

previously possible. Stock Markets can be observed in real-time, supermarkets know

the exact quantities of goods on their shelves (via Electronic Point of Sale (EPOS))

and their customers weekly shopping lists (via Loyalty Cards), companies can

measure the exact output of machines on the shop floor (via Computer Aided

Manufacturing). This information is, however, worth only as much as the gain

derived from its ownership. To be able to quote a share price or stock level is fine,

but the information has already become historic. What is required in decision making

is a means by which to identify patterns and trends in the large volumes of data

currently available, and to increase the confidence in the predictability of this data to

an acceptable level.

The capabilities Artificial Neural Network (ANN) models have in recognising

patterns and trends in large volumes of data has meant that they are being

increasingly used for a variety of industrial/commercial applications.

ANNs are a form of computer software which took their original inspiration

(McCulloch and Pitts, 1943) from mans limited understanding of the workings of the

human brain. Research has been carried out in this area for two broad reasons. The

first and original was an attempt to model the human brain electronically to develop

a greater understanding of its operation. The second, and most relevant in this
2 Overall Contribution - the manager would consider the organisation’s other ventures, market share, market growth and long
term survival in his/her decision

12

Bryan Mills 1997

instance, is the development of these models as a mathematical tool for studying

patterns and relationships in data.

The mathematics which form these models are particularly useful when

dealing with non-linear problems, problems which cannot be graphed by use of a

straight line, of which there are numerous examples in business (demand/price,

production level/cost, share price/ROCE/EPS - an increase in the independent

variable (price) does not guarantee a proportionate increase in the dependant

(demand)). ANNs are also capable of dealing with dependant variables which may

have several variables acting on them (e.g. interest rates, inflation and estimation of

risk - in cost of capital calculation), the relationship between each being both

theoretically appreciated and explainable but not easily converted into an equation or

algorithm (Klimasauskas, 1991 and Scocken, 1994).

It is the ability to deal with non-linearity, multivariables and large volumes of

data which gives ANNs what is perhaps their most impressive features - pattern

recognition and self learning. ANNs receive their information (their knowledge) via a

process of training. Sets of data and desired results are passed through the network

until the computer is able to create, to a reasonable degree of accuracy, the desired

result. This is made possible by the networks ability to generalise the training data

presented to it and form an output, given new inputs, based on this generalisation.

Once this training stage is complete a problem (independent variables) can be input

and a result (dependent variable) is generated.

Current application of ANNs includes, amongst others; stock and money

market forecasting (Trippi and Turban,1996), face and handwriting recognition

(Rogers, Kabrisky, Ruck and Oxely, 1994), recognising whether station platforms are

busy or not, missile direction systems, voice recognition, voice control of computers,

page 13

data mining (Wiggins, 1994), industrial signal processing (Wiggins, 1994), modelling

of traffic flow (Recker, 1995), human resource management (redundancy selection)

(Coit, 1996), new product feasibility studies (Madu, 1995), risk evaluation, chemical

analysis, weather forecasting and resource management (Davalo, Naïm, 1991), a

complement to business decision support systems (Scocken, 1994), operations quality

control (Horridge, 1997) and the processing of marketing data.

Popular Misconceptions Concerning Neural Networks:
The subject of Artificial Neural Networks (ANNs) is an example of a name

not being self explanatory. The description ‘Artificial Neural Network’ is a

misnomer, it suggests an artificial representation of the human mind (it being

composed of a network of neurons). Exciting though the creation of an ‘artificial

mind’ would be, the ANNs currently in operation are little more than computer

programs capable of doing clever ‘sums’. The cleverness of these ‘sums’, however,

is not to be taken lightly. Systems have been developed which are able to identify

patterns in very large samples of data, produce a method of calculating relationships

between data where conventional mathematics would have been inadequate for

practical application, and represent a very strong possibility of development of

systems better suited to understanding our own fuzzy3 world.

As a subject, ANNs are fairly inaccessible and fraught with misconceptions.

The subject is clouded by two separate, but interrelated explanations, and this

difficulty is further compounded by the absence of accessible knowledge. On the one

hand there are the works of various academics and academic institutions. On the

other is the general public’s4 understanding of what ANNs represent - if they are
3 Fuzzy - e.g. Language - hot, warm, cold mean different temperatures to different people and the boundary between hot and
warm (for example) is not clear (is 18 degrees warm, 19 degrees hot and just as hot as 28 degrees?)
4 Used here simply to describe those outside of the fields of Mathematics, Computing and Psychology - not intended to be in
any way derogatory.

14

Bryan Mills 1997

aware of them at all.

The public’s understanding stems mainly from the world of science fiction

and ‘popular’ science programmes. It is a world of Arthur C Clarke’s HAL (2001

etc.), Philip K. Dick’s Bladerunner5 - thinking machines which inevitably turn on

their creators, with devastating results6. This understanding is not assisted by the

anthropomorphic nature of the language surrounding ANNs and the willingness of

some academic’s to emphasise this definition (for example - Professor Aleksander,

Imperial College London - “Magnus [a computer program] has a mind of its own” -

(Millar7, 1996)). The use of words such as ‘thinking’, ’neuron’ and ‘understanding’

all point towards machines which, eventual, may replicate the human thought process

to the point of being conscious. The reality of the situation is quite different, at

present computers can represent little more than a few thousand neurons, compared to

10,000 in a Cockroach’s brain and 100 billion in a humans (The Economist, 1995).

The academic world often uses anthropomorphic terms to overcome some of

the limitations of language and the mathematical nature of more correct descriptions.

For example in the development of a computer system to control the heating, lighting

and ventilation of an office building one may be tempted to use expressions such as

-“to develop a system which is aware of its environment”. However, use of the word

‘aware’ may suggest consciousness and use of ‘its environment’, as opposed ‘the

environment in which it operates’ could suggest ownership and, therefore, existence

beyond being an object. The difficulty stems from the absence of a more correct, and

equally as convenient, shorthand. The alternative - “to develop a system which

constantly monitors the surrounding environment and compares this information with

5 More correctly - the original book was called- Do Androids Dream of electric Sheep
6 The defence analyst and writer Warwick Collins has gone so far as to call on the government to restrict the human attributes
scientists can give programmes/machines (Millar 1996, The Guardian Newspaper, (17/12/96) page 4, eighth paragraph)).
7 The Guardian Newspaper, 17/12/96 page 4, second paragraph)

page 15

a pre-programmed set of ‘ideal’ conditions” - explains the process with a reduced

likelihood of confusion, but is not necessarily more accurate. The readers frame of

reference provides the key to which language would be more appropriate.

The use of such terminology creates few problems within the field because the

level of understanding is such that the words used often have two separate meanings -

the computer related meaning and the human related meaning - for example:

Neuron -

• Human related meaning - a cell which responds to various inputs by

producing responses - a processing unit.

• Computer related meaning - a part of ANN computer program which

performs a calculation - a processing unit.

The definitions are similar and would appear to suggest that, if a significant number

of ‘computer neurons’ were assembled, a human brain could be replicated. Whilst

this formed the inspiration behind some of the early research in the field (for example

Rosenbalt 1958, 1961), modern theory points to a level of complication within the

human brain which makes the early optimism seem naive at best.

A more comprehensive discussion on matters of human and machine

consciousness is found in Penrose, 1988, Emperors New Mind, and 1994, Shadows

of The Mind.

This thesis is intended to explain Artificial Neural Networks in such a way as

to reduce some of the confusion which often surrounds the topic. In addition it is

intended to simplify the application of ANNs (to a given problem) by the

development of a schema (both paper based and as computer program). This schema

16

Bryan Mills 1997

will greatly simplify the choice faced by the manager when considering which

mathematical tools to use in both decision, classification and control problems. To

enable the full value of this schema to be realised the thesis begins with a

comprehensive review and simplification of existing literature. As previously

discussed the confusion stems from three broad areas - media hype, anthropomorphic

descriptions and texts aimed at a specialist reader (scientific) and it is intended that

this thesis will contribute towards redressing this balance.

page 17

Chapter 2 - Discussion of Aims, Methodology
and Research Philosophy
Aim:
This study aims to develop a level of understanding from which the business

manager (who is unlikely to be an IT specialist) can establish the relative

merits/demerits of the ANN technique for business decision support analysis.

The project aims to make inroads into some of the more accessible academic

texts with a view to creating a more intuitive guide to ANN use aimed at the business

manager and student. To aid this explanation a schema or system will be developed

whereby the reader can assess the suitability of ANNs for a problem they wish to

solve. To assist in the discussion on the suitability of ANNs for given problems there

will also be an assessment of current uses and the advantages and disadvantages that

application presents.

Objectives:
1) To conduct a literature review of the fundamental concepts underling ANNs.

2) To examine the existing use of ANNs.

3) To develop a system to enable problems to be assessed for the suitability of ANN

application.

Benefit of the project to industry and commerce:
Progress in the development of ANNs is closely tied to the development of

computer equipment. It is only within the past 5 years that computing power has

become cheap enough to make ANN use a viable possibility. However ANNs have

remained in the exclusive domain of the scientist and mathematician for the past 45

18

Bryan Mills 1997

years and there are few accessible texts for the non-specialist.

The field of ANN contains possible solutions to business problems not fully

addressed by present mathematical techniques (Tucker, 1997). As Gleick (1993) and

Waldrop (1992) have commented, non-linearity of patterns are rife in the enormous

volumes of information produced by industry and commerce (e.g. the financial pages,

actuary data, market research responses). ANNs enable the user to analyse this data

more accurately than traditional problem solving techniques, making them a

commercial advantage to many industrial sectors.

The growth of research in the neural area:
The field of ANNs is expanding at an amazing rate. The expansion of the

subject is closely linked to technological developments in the IT field. As this area

continues to develop8 there will be an increasing expansion of opportunities in the

field of ANNs (Medsker, Turban and Trippi, 1996). Funding of research within the

field of ANNs is continuing with the Japanese government having budgeted $250

million over next 10 years, and the US government having pledged research funding

of $400 million over next 6 years (The Economist, 1995).

Patents Registered USA
300

250

200 Combined
Number

150 Comp. Int
ANN
100

50

0
1986

1987

1988

1989

1990

1991

1992

Year

Diagram 1 - Patent Activity

8 Moores Law suggests a doubling of the number of chips on a transistor every 18 to 24 months (J. Scholfeild, 1996, The
Guardian Newspaper (31/10/96) page 3 Online Section).

page 19

Diagram 1 shows patent activity in the USA for the years 1986-92. It can be

seen from the graph that the growth of work within this field is almost exponential. It

is also important to note that the full extent of ANN’s application within business

(particularly finance) has yet to be realised (Farrar, Tucker and Bugmann, 1997).

Methodology and Approach:
The project is based mainly on a comprehensive literature survey and review

of texts within the field of ANNs. The literature search was conducted in the first

instance to develop a clear understanding of the subject. From this, a succinct

explanation of the concepts underpinning artificial neural networks, aimed at business

managers, has been produced. The greater understanding engendered by the literature

research provides the basis for an analysis of the advantages and disadvantages of the

use of ANNs and forms the foundation of the schema development.

Diagram 2 - Methodology

20

Bryan Mills 1997

The above chart (Diagram 2) represents the flow of tasks from development of

original synopsis to the conclusions and recommendations.

Schema development:
The schema, which forms the most pragmatic part of the thesis, was

developed from the literature research. The schema seeks to answer the question

“Do ANNs offer a realistic solution to a given problem”. As ANNs are capable of

dealing with a variety of problems, and as the business community usually has a

variety of different problems under review, it is intended that the schema will be

general in its approach, whilst maintaining effectiveness and accuracy.

The schema is developed both as a flow-chart and as a computer program.

By establishing the specific data and training requirements of ANNs it is possible to

construct a series of questions of a non-technical nature, which the manager can

consider concerning the problem under review. The schema follows the flow of the

responses and culminates in a suggestion for further action. The reasons for the

suggested actions are explained, allowing the manager to consider various courses of

action depending on the resources available to him or her. Where appropriate the

schema will suggest alternative decision making techniques which could prove more

cost efficient or accurate than the use of ANNs.

page 21

Chapter 3 - Explanation of the Fundamental
Concepts of ANNs
Introduction:
This chapter will seek to place ANN in the broader context of computer software. A

highly simplified description of the workings of ANNs will follow. Once this basic

understanding has been enabled a more detailed explanation will follow, which is

intended to equip the reader with a reasonable level of knowledge on the topic, to

enable both further study or practical application.

An outline explanation of the fundamental concepts of Artificial Neural Networks:
An ANN is simply a computer program which, through the adjustment of

mathematical weights, is able to create a model capable of producing results (usually

in the form 1 or 0, or scaled using decimals from 0 to 1) , for a given set of numeric

input data, to a reasonable degree of accuracy. The network will often include Front-

end subsystem (Attrasoft User’s Guide and Reference Manual, 1996) to enable both

data encoding and data decoding:

Data encoding: to convert user-application data to neural input data.

Data decoding: to convert neural output data back to user-application data.

ANNs can be considered as part of the larger group of computer based

techniques referred to as Knowledge Based Systems.

Knowledge Based Systems:
There are numerous forms of computer systems which fall under the general

heading of Knowledge Based Systems (KBS). This use of computing power can be

defined as:

page 22

Bryan Mills 1997 Chapter 3 -Fundamental Concepts

“a system within which data is analysed by comparison with sets of pre-obtained

data by following specific rules and/or weighted relationships” (author)

To facilitate this comparison the system will require a set (library, files, historic

records) of knowledge. This knowledge is the basis upon which the system operates

and can take numerous forms:

• Financial Data - credit limit, accounting ratios, past sales figures

• Human Resource Data - qualifications, age, experience (years)

• Operational Data - machine failures (frequency), tolerances, re-order levels

As can be seen from the above examples, the knowledge base is often a form

of database of the sort now commonly found within most organisations. The

difference between KBS and conventional databases is the level of interrogation and

control which is placed within the systems remit. As opposed to merely storing data

the system will be called upon to ‘trawl’ through the data to identify trends and

patterns of behaviour or it may use its knowledge to instigate some form of action.

For example if a bill became overdue the system could issue a reminder without the

need for an operator to intervene. This is possible because the system knows the date,

the date the last payment was made, the difference between this date and today’s and

the company’s policy on ‘debtor days’. This example also indicates the level of

understanding possible - knowledge, in this instance, can in no way be said to be in

the same sense as a human would know what it was to have an overdue bill.

It becomes apparent that many modern databases are capable of achieving

similar results to knowledge based systems. The difference between the conventional 9

knowledge based system and databases is becoming increasingly subtle and is more

9 Conventional as opposed to ANNs

page 23

an emphases on use as opposed to structure. Most data bases (Microsoft’s Access for

example) are capable of interrogating data and also of issuing notification should this

be required.

The difference between Artificial Neural Networks and Conventional Knowledge
Based Systems:
As previously discussed ANNs are part of the broad heading of KBS,

however it is important to recognise that there are fundamental differences between

ANNs and other KBSs. Whilst a KBS has the rules and relationships concerning its

knowledge programmed into the system (albeit kept separate from the knowledge)

ANNs develop their ‘own’ rules and relationships through a process of self learning.

The self-learning abilities of ANNs are most simply explained by example:

Suppose the relationship between the following set of data was desired:

Advertising 100 150 50 10 200
Spend £’s
Sales £’s 300 450 150 30 600

Table 1 - Sample Problem

From the above table, by dividing sales by advertising spend (or by drawing a graph),

it is quite possible to estimate that sales are three times advertising spend. It is

possible to estimate this figure because, a) we appreciate and could prove a

relationship between the variables, b) there are relatively few variables which c)

enables a simplistic approach to the formulation of a equation (relationship). It can

be appreciated that a more complex relationship may exist, which is beyond the

simplistic approach used so far. To solve a multivariable and non-linear10

relationship would require the use of statistical techniques which are often

10 Non-linear - a relationship which would create a the graph of a curve as opposed to a straight line, the equation of which
would contain powers x2etc..

page 24


complicated and/or rely on a degree of approximation.

ANNs take a different route to establishing the relationship between variables

- by adjusting the values of numerical weights within a equation (function). The

weights will act upon the data to alter its value with the intention of producing the

desired result. To enable this process to take place the system must be exposed to the

data a set at a time (e.g. Advertising Spend of £100 and sales of £300 is the first set

of data in Table 1). The computer will, in the first instance, apply a guess as to the

value of the weights to be used (although this starting value may well be pre-

programmed or random (Hopgood, 1993)). This ‘guess’ will, inevitably, prove to be

wrong and the system will alter the weights and retry.

The first set of data will be treated as below:

Advertising Spend £’s Weight Function Result Desired Result £’s
100 1 Spend * Weight 100 300
2 Spend * Weight 200 300
3 Spend * Weight 300 300

Table 2 - Simplified weight method

It can be seen from the above that after a series of iterative steps the system was able

to produce the desired result, and in our previous example this weight would be

acceptable for all of the data sets.

The function used in the above example is linear as opposed to the non-linear

functions used within ANNs, also the number and relationship of the variables is

more simplistic than would normally be encountered (for an example of the more

complicated OR problem see Appendix 1).

It is possible to imagine that if the relationship was more complicated and our

weight of 3 proved unsuitable for the next data set it could be adjusted again and then

page 25

re-used on both sets until a satisfactory relationship was obtained. Most ANNs have

an adjustable degree of tolerance (between ANN output and training set’s expected

result), for example WinNN has adjustable target error to determine the acceptable

Root Mean Square error11, once target and RMS match training of that net is said to

be complete - note; the lower the acceptable error the more refined, and less

generalised the net becomes.

The procedure described in this simplified model could be said to represent a

single neuron (processing unit, cell). To enable more complicated relationships to be

developed ANNs have more than one neuron and it is not uncommon for the results

of one neuron to be the input of another. If these connections were viewed pictorially

they would form a network of interconnected neurons, and hence; Artificial (non

human) Neural (processing units) Network (interconnection of neurons).

Explanation of the operation of ANNs:

First Principles:

Knowledge Based Systems:
As discussed in the introduction, ANNs are a form of software that has the

ability to self learn. Unlike more conventional (rule based) forms of knowledge-

based systems the algorithms used to enable the inference engine (rule interpreter) to

work are not hard programmed or explicit rules based along the IF...THEN...ELSE

pattern. Instead the program uses a series of mathematical weights to establish data

relationships. To enable an understanding of the difference it is first necessary to

explain the basic components within knowledge based systems.

Knowledge based systems contain 3 core components. An interface with the

11 RMS - the square root of the mean of a set of squared numbers

page 26


user (outside world) to enable both data input (keyboard, sensors, etc.) and output

(monitor, servos, printout, etc.), a knowledge base (data base) and an inference

engine (rule interpreter, instructions, ‘main program’). There are two other

components often found within knowledge based systems; an explanation module12 to

enable the reasoning behind the decision made to be shown, and a knowledge

acquisition module to enable the knowledge base to be built by use of one or more of

the acquisition techniques possible (Hopgood, 1993). Diagram 3 illustrates the

relationship between these components:

Diagram 3 - Knowledge Based System
As shown in diagram 3 the relationship between the components within a KBS is

relatively straightforward. Information is gathered from the outside world, stored

within a data base and, upon a query being made, accessed to provide an answer.

Rule Based System:
A rule based system is based, fundamentally, on the IF...THEN...ELSE

structure (propositional logic/calculus). The following illustrates this point:

IF credit level is greater than pre-agreed limit
THEN stop credit and issue reminder
ELSE do nothing

Where the credit level is computed from inputs and the pre-agreed limit is contained

12 ANN have great difficulty in satisfying this requirement and your attention is drawn to the discussion in Chapter 4

page 27

within the knowledge base. It is both common and desirable that the information

required to process the rule is contained explicitly within the knowledge base as

opposed to implicitly within the program to enable a more simplistic and robust

method of updating to be used (e.g. as opposed to altering the program’s source code

entries in a data base are changed)(Hopgood, 1993).

Whilst it is can be appreciated that this is a simplistic view of the workings of

a rule based system further developments serve only to improve and compound this

basic methodology (see for example; Appendix 2 - Bayesian Updating).

Artificial Neural Networks:

Overview:
The key difference between ANNs and KBS lies with the inference engine.

As opposed to having a logic imposed on it, the network is allowed to develop its

own logic by means of training, either supervised or unsupervised. Weights are used

to determine the strength of relationships and there is no IF...THEN...ELSE. Instead

the network decides the relevance of inputs and their interconnections based on its

own experience (e.g. it has been trained).

The network consists of a selection of nodes or cells arranged structurally in a

predetermined topology. The nodes are grouped in layers. This takes the form of an

input layer, one or more hidden layers and an output layer. Each node accepts

various inputs, adjusts them via weights, adds all inputs together them, uses them to

calculate a non-linear function, outputs them for passing to another cell, or if last cell

uses the output layer to compare the result with the expected answer and then passes

the difference back through the network to allow weight adjustment to correct errors

(backpropagation). A simplified single neuron calculation appears thus:

page 28


Diagram 4 - Single Neuron Calculation

Pictorially this can be represented thus:

Diagram 5 - Representation of a Neuron

This processes is explained, in detail, below and would normally be performed by

numerous neurons/cells/nodes within one or more layers at the same time e.g. in

parallel.

Components:
It is important to appreciate that ANNs gain their ability not from a

predetermined layout or selection of weights but from the networks ability to adjust

weights and alter (strengthen/weaken) connections between nodes. Before

attempting to explain the mathematics behind these interconnections an explanation

of the key components of the network is required.

page 29

Nodes:
Medsker, Turban and Trippi (1996) comment that most commercial ANNs

have between 10 and 1,000 nodes arranged in three layers, and that although 4,5 or

more layers is not unheard of, it is not deemed necessary for business applications

Hopgood (1993) describes a node’s role as “to sum each of its inputs, subtract

a bias term, θ, and pass the result through a non-linear function, fnon-linear, known as

the activation function”. Hopgood’s emphasis on the bias term is discussed below.

ANNs have sets of these calculating functions and a description is given by Patterson

(1996) as “Every ANN is composed of a set n of simple neural computing elements

(neurons, units, processing elements or PEs, cells)” and where this set of cells can be

given as:

C={ci } i=1,2,...,n.

Patterson goes on to comment that cells can be grouped into three distinct categories;

input, hidden (or interior) and output.

The interior layer of cells are the nodes which perform the majority of the

calculation process and are discussed under various headings below (Weights and

Bias Terms, Generalisation, Choice of Function). Input cells are the cells which take

the initial input of stimuli (discrete keyed values or continuous sensor data) whilst

output cells enable the display of results or the control of effectors. The inputs and

outputs are usually represented by the vector x of n dimension and the output y of m

dimensions (simply put; x1, x2,...,xn and y1, y2,...ym).

.

The input data often takes the form of a text file in PC based neural nets:

page 30


Diagram 6 - Screen dump of a text file for use in WinNN

The above input file demonstrates the relatively simplified form of data which

may be used in ANN training and operation. The above example does not feature

scaling of the variables as this is not required in this instance, however it does

provide a representation of the form input files often take. The file represents 4

training sets, each with 2 inputs and 1 output (4,2,1). In the first training set (case) x1

would be 0, x2 would be 0, and the expected result (y1) would be 0. This would be

followed by the second set which would be 0,1,1 respectively and the third etc. This

data represents the commonly used XOR example/problem and gives the result 1 for

an even number of inputs and 0 for an odd (Patterson, 1996). The trained network

could be used to solve simple yes/no problems for example:

Account Purchased Arrange Reason
Customer? Over 200 visit by
Units? sales staff
n n n Probably not trade customer
n y y Offer trade account
y n y Try to increase sales to trade customer
y y n Credit limit probably reached

Table 5 - Input file explanation

The above example is highly simplified. It does, however, represent the style

of business control system which uses yes/no responses. It is important to note that

the reason for the decision would not be given by the ANN.

page 31

Weights and bias terms:
Once the data is entered into the network its connection from input layer to

calculation node is used to facilitate the addition of weights. Patterson (1996) uses

the following notation:

net=x1w1+x2w2+x3w3=∑xiwi

where x is input variable and w is weight. Equation 1

Hopgood (1996) makes the point of subtracting a bias weight to give:

net=∑xiwi-θ Equation 2

whereas Patterson (1996) prefers the use of a bias fixed value of 1*w 0 on one of the

input links where w0=-θ. The use of either method is considered acceptable.

The weights remain independent of the variables (x) so as to facilitate their

adjustment during backpropagation. It is helpful (Patterson, 1996) to view the

relationship between the weights, class membership and the bias value in terms of a

two-dimensional plot. In more complicated example the weight value vector (wi)

would define a hyperplane in n-space where n is equal to i- the number of variables.

In this example n=2 and so it is two-dimensional.

Diagram 7 - Class Membership

page 32


The significant points in Diagram 7 are the offset - giving the value of the

bias weight (w0)and the slope of the line which is given by - w1/w2. Thus the

formation of the line is derived entirely from the weights and future x values will be

shown as either belong to the class or not. Patterson (1996) places particular

importance on this boundary line as he identifies it as the key to the net’s autonomy

through its ability to alter weights and so define what is within the set and what is

outside it. The example shown is linearly separable in that its boundary is define by a

straight line/plane. This is largely due to the simplicity of the example (2-

dimensional) and partly due to the fact that it would be intended for use in a single

layer network. As an ability to cope with non-linearity is one of the key features of

ANNs they are, of course, capable of dealing with more complicated examples.

Generalisation:
To deal with n-dimensions and non-linearity ANNs generalise. Patterson

(1996) discusses generalisation in terms of “describing the whole from some of the

parts” and points out that the alternative to an ability to generalise is knowing

everything. It is possible to identify an object by knowing some general rules

involving that class of object without knowing every member of that class. For

example a metal frame with two wheels, a set of handlebars, a saddle and fitting

various size requirements is probably a bicycle. It is not necessary to memorise

every manufacturer’s catalogue.

ANNs generalise by creating a class which exists in weight space with its

boundary given by the mapping function F (Patterson, 1996). Mapping functions are

either autoassociative or heteroassociative meaning they map the original pattern

from noisy/incomplete data or map input patterns to different output patterns

page 33

respectively. Mapping functions are shown mathematically as F:x→y. The non-

linear boundary can be shown by the simplified diagram:

Diagram 8 - Universe of objects
From Diagram 8 it is possible to see that the boundary established by the

network includes both the training set data and other instances of the data not given

in the training set but which would be encountered if more sets of data were made

available - therefore giving the network an element of flexibility. The boundary must

therefore include all examples of the training set, all examples of data corresponding

to the nets function but not known at time of training, and exclude all other data sets.

Once this has been achieved the generalisation can be said to have been a complete

success. It is apparent that the method of training and the selection of data will have

particular importance on the accuracy of this process.

Choice of mapping or activator function:
As mentioned in ANN Overview (above) the summed weights are passed

through a non-linear function before proceeding to the next cell or output layer. This

is the mapping function referred to above so that F:x→y, and is known as the

activator function (or activation level/summation function - Medsker, Turban and

page 34


Trippi, 1996). It is suggested by Patterson (1996) that the choice of activator should

be a “monotonic nondecreasing function of net”. This simply means that the

function should hold true for all facts, even if it was originally based on only a

sample (monotonic) and that the slope of the function should rise from left to right (it

should not cause values to diminish in relation to other lower values; x=2 y=0.88,

x=3 y=0.95 for a sigmoid value). Hopgood (1996) makes the point of stating that

“The weights and biases can be learnt, and the learning behaviour of a net depends

on the chosen algorithm”. It is further stated that the sigmoid function is most

commonly used and Patterson (1996) concurs with this statement. The sigmoid

function is given as:

1
fnonlinear ( x ) =
1 + e−x Equation 3
and would appear graphically thus:

Sigmoid Function
1

0.8

0.6

0.4

0.2

0
1

3

5
-5

-3

-1

Diagram 9 - Sigmoid Function
The above diagram (Diagram 9) shows some of the key features of the sigmoid
function and thus its reasons for use, these features include:
• The ability to make all values positive .
• The relatively fine level of discrimination (the slope)
• The fact that all results are given as between 0 and 1.

Data pre-processing:
It can be appreciated that the data under investigation may take various forms.

The ANN will require inputs which are of a numeric nature. This does not prevent

page 35

non-numeric data being analysed provided it can be converted, with consistency, into

numbers. For example, risk is a common business concept which is regularly

translated from the vague - safe, moderately safe, risky, very risky - to a range of

probabilities (say 100%, 75%, 50%, 25% probability of a favourable event

occurring).

Once the data has been gathered, and given a numeric value if required, the

efficiency and accuracy of the ANN can be enhanced by information pre-processing.

Wasserman (1989) and Patterson (1996) both concentrate on normalisation, which is

a common form of pre-processing. Normalisation is a method by which all the data

being processed can be given a common minimum and maximum value. For

example, readers familiar with statistics may draw a parallel between the

normalisation of the data with techniques used in statistics to determine probabilities

using the normal distribution curve(NDC). Here any distribution of data can be

mapped (converted) to the NDC which has its probabilities pre-calculated.

The most common form of normalisation will see all the data converted to

values between 0 and 1. This has the advantage of both reducing the difficulty of

manipulating large numbers (simply put it is easier to manipulate, say, 0.2 than

2,000,000) and enhancing the networks ability to adjust weights by reducing

unnecessary emphasis (for example in loan calculations interest rates may be given as

% or decimals, loan size in millions). Certain activator functions will restrict output

to between 1 and 0, regardless of input (Medsker, Turban and Trippi, 1996). For

example the sigmoid function mentioned above:

Input 1/1+e-x
0.5 0.6225
5 0.9933
17 0.9999999586
35 0.99999999
35.2 1

page 36


2256 1
-5 0.0067

Table 4 - Sigmoid values

In can be seen from the above table that data which is beyond a certain range

approaches a value of one (exact point at which it appears as one is dependent on the

number of decimal places used and rounding). As the data is multiplied by weights

and summed before entering the activator function it can be appreciated that no

accuracy is gained by maintaining a mixture of large and small numbers (as it will

convert everything to between 0 and 1 regardless). It can also be recognised that the

output from the net, having passed through various activator functions, will be of

decimal or Boolean form (1 or 0). It is important to recognise that as the results for

the sigmoid function return 0 and 1 for a large range of negative and positive

numbers, respectively, it is advisable to restrict the inputs and outputs to values

between 0.1 and 0.9 (Tucker, 1996). This also avoids the use of either 1 which has

the effect of distorting the part of the sigmoid function (e-1 = 1/e ) or 0 which distorts

sigmoid (e0 = 1) and has the effect of negating weights (x1w1 = 0 for x = 0).

There are numerous methods of data pre-processing and Tucker (1996)

contains a accessible description of six common methods. These are given as

Distribution truncation and squashing functions, Natural log regression, Ratio

splitting, Positive/negative split, Variable pre-selection, and Data squashing.

These can be explain, briefly as:
Technique Description
Distribution truncation The removal of outliers, the removal of unusually large,
and squashing or small, numbers from the data set
functions

page 37

Natural log regression Using logarithms to convert data into small units.

Ratio splitting Not inputting the results of a ratio but the numerator (top)
and denominator (bottom) values separately.

Positive/negative split Separating a variable which has both positive and
negative examples into two separate variables (e.g. Profit
becomes Profit or Loss as opposed Profit £200 or Profit -
£150 (loss))

Variable pre-selection Simply manually deciding which variable to include and
which to leave out.

Data squashing Converting all data so as it is within a pre-defined range
(0.1 to 0.9 being given as most appropriate).
Adapted from - Tucker (1996)

Table 5 - Data pre-processing

It is also important to emphasise that the data (variables) used within a ANN

should have some form of theoretical basis for a relationship before they are

included13. There is little point comparing interest rates, inflation, project life, level

of risk and outside temperature if the problem under consideration was to determine

the ‘cost of capital’ to use in net present value (NPV) calculations .

Training:
By their very nature, ANNs require training. This training can be viewed in a

similar manner to human training in that the activity is repeated until the system

produces a satisfactory result (or it is decided to abandon the training run). Hopgood

(1993) suggests that the training process is, more correctly, an “error reduction”

process. Alternatively Patterson (1996) suggests that it is “adaptive learning in a

dynamic environment” emphasising the flexibility of the method.

There are, however, three separate methods of training - Supervised,

reinforced and unsupervised. Wasserman (1989) argues that unsupervised training is

13 The appendix of Farrar, Tucker and Bugmann (1997) provides an example of how this could be approached.

page 38


the only “biologically plausible” method of training ANNs. The desire to be

“biologically plausible” suggests Wasserman is more concerned with the theoretical

study and attempted replication of a brain, than practical application of a

mathematical technique. Plausibility should be weighed against ‘usefulness as a tool’

when ANNs are used in applications not related to the study of

psychology/neurology. The use of unsupervised learning will be discussed under the

heading of Topology below (Kohonen being one of its originators). Supervised and

reinforced learning methods are broadly similar in philosophy and so an explanation

of supervised training will be made first.

In supervised training the results obtained from the cells are compared with

the desired results (contained within the original input). The difference is referred to

as an error and the weights contained within the network are adjusted

(backpropagation). This adjustment continues until the sum of the squares of the

differences (errors) are minimised (in a way similar to linear regression - line of best

fit calculations).

This process can be simplified thus:

• Subtract output from target contained within input vector.

• Square the difference to remove negative signs14.

• Add all the squared differences together.

• Compare this answer with the desired level of error.

• If error unacceptable adjust weights and begin again.

The minimisation is said to be complete when the error has reached an acceptable

level.

Reinforced training follows a broadly similar route to supervised but, as

14 Removal of negatives - else error of -100 plus error of 100 would indicate no error.

page 39

opposed to calculating the level of error, the ANN is merely informed that it is either

wrong or right and continues to adjust weights until a correct result is identified.

Patterson (1996) suggest that the method is seldom used in practice and attention

should instead be paid to supervised and unsupervised learning (other authors make

little or no mention of reinforced training).

It should be noted that it is possible to over-train a network. To continue the

human analogy, this would represent an employee trained, in a vocational way, to

such a level that they were only able to perform their present role, and none other.

This may occur, for example, in a factory where an operative has been doing the

same repetitive job for such a long period of time their ability to transfer any of the

skills they have learnt becomes hampered.

Topology:
The Multilayer Perceptron and Kohonen’s Self Organising Net will be used to

give a more detailed explanation of ANN construction.

The Multilayer Perceptron - an example of supervised learning/training:
The multilayer perceptron is most commonly used for non-linear estimation

or classification. It is often referred to as a feedforward network15. This net

comprises of the conventional input and output layers, with a programmer (user)

defined number of nodes and layers between (hidden). The number of nodes used in

the input/output layers is data specific. A diagram representing the multilayer

perceptron is provided below:

15 Generic name used to refer to this network in particular - any network in which the data flows towards the output is
technically feedforward.

page 40


Diagram 10 - The Multilayer Perceptron

It is usual for the number of input nodes to equal the number of variables

under consideration, likewise the number of output nodes will equal the number of

desired outputs. The number of nodes are often used to give the network its name.

For example the network above may be termed a 3-4-4-2 network as it has 3 input

nodes, two sets of 4 hidden or computational nodes and 2 output nodes and would be

described as a four layer network (although Hopgood (1993) indicates that there is

argument concerning this issue as some claim the input layer should not be counted).

It can be seen from diagram 10 that each of the calculation nodes is

connected to all four of the next set of nodes, but that no nodes are connected

vertically (with reference to the diagrams rotation only). The weights would be

adjusted at (on/during) the connections between the nodes, with each node being a

non-linear function (activator function). The process could be described as - input,

weight adjustment, conversion via activator function, become input for next layer,

weight adjustment, conversion via activator function and output to final layer where

error checking will occur and instigate backpropagation to adjust weights (note- this

method utilises supervised training).

The number of both layers and calculation nodes is dependent on the

page 41

programmers decision given a certain problem. It is common to start at a low number

of layers/nodes and then increase this number until the desired level of

accuracy/speed is achieved.

The Kohonen self organising net- an example of unsupervised learning/training:
As mentioned previously, the Kohonen self organising net is a form of net

which learns using an unsupervised method. This type of network is most commonly

used for pattern recognition and is often referred to as The Self Organising Feature

Map (Patterson, 1996). As well as differing from the Multilayer Perceptron (MLP)

in training method and application it also differs in that it is a single layer

feedforward network as opposed to multilayer. In addition the network works on a

principle of “winner takes all” (Wasserman, 1989). This means that as the

information is processed within the layer, one and only one, node will transmit an

output. This is why the method is often referred to as a “competitive one “(Patterson,

1996 and Wasserman, 1989).

Diagram 11 - Kohonen Self Organising Feature Map
Unlike the Multilayer Perceptron the Kohonen net learns in an unsupervised

manner. As the net is attempting to replicate a pattern in various sets of data it trains

by continually processing different data-sets until it is satisfied that each new run

page 42


will create a replica of previous runs (within tolerances pre-set). For example a child

is not always told the same word repeatedly until he or she learns to say it (by being

told when he or she has said it incorrectly) but rather learns by being exposed to

numerous examples of speech and establishes the ability to replicate these patterns

independently. This learning style often sometimes referred to a competitive filter

associative memory (Medsker, Turban and Trippi, 1996).

It can be appreciated that the differing topologies are related to the different

applications to which the networks are applied. Classification (MLP) and pattern

recognition (Kohonen) are two quite different problems. An example of

classification may be given as “does this data correspond16 to a customer with good

credit ratings”, whereas pattern recognition is more commonly associated with

speech or hand writing recognition or recognition of trends or patterns within

financial information (which may return us to the credit problem by a different route).

In summary it can be said that the Multilayer Perceptron works by co-

operation and the Kohonen network by competition between nodes.

Summary:
ANNs offer an alternative to conventional forms of knowledge based systems.

Whilst ANNs drew their original inspiration from studies of the human mind further

developments in this area are limited by technology. The use of ANNs as a

mathematical tool is, however, both possible and practical at today’s levels of

technology.

The use of self learning and pattern recognition provides a solution to

problems which, using conventional techniques, may have been overcomplicated or

simple not possible. The basic concept of learning by example through the

16 Is it a member of the set of customers with ....

page 43

adjustment of mathematical weights is a reasonable approximation of the process and

allows the internal computations and structure of the network to be treated as a

‘black-box’17.

Networks topologies and learning/training methods are problem specific and

it is should be appreciated that the correct choice of network, training style, training

data, pre-processing method and activator function all contribute towards the

successful application of the network.

Diagram 12 (overleaf) represents, in the form of a flow diagram, the series of

discrete steps which make up ANN operations.

17 Black-box - The exact details of the internal workings need not be known to facilitate use.

page 44

Diagram 12 - The operation of ANNs - flow diagram

Chapter 4 - Investigation into advantages,

disadvantages and current application of ANNs.

Introduction:

In order to allow a more complete consideration of the suitability of ANNs for

a given decision making problem, it is necessary to appreciate the advantages and

disadvantages that the use of ANNs provides. The self-learning pattern recognition

ability which gives the ANN its distinct characteristics, also creates disadvantages,

some of which are problem-specific, whilst others are universal.

ANNs are currently used for a wide variety of decision support problems.

Perhaps the most commonly cited problem is that of distress modelling (prediction of

bankruptcy), but examples are also found in such diverse areas as production control,

new product development and traffic light sequence (road junction) modelling.

Advantages and disadvantages:

Numerous authors have commented on the advantages and disadvantages of

ANNs and the following, cited from Hammerstrom (1993), provides what could be

regarded as fundamental points of interest.

Advantages:

• They can infer subtle, unknown relationships from the data.

• They are non-linear so that complex problems can be solved more

accurately than by linear techniques.

• They are highly parallel, which makes them run faster on computers with

parallel processors than alternative methods.

Additional benefits are given by Medsker, Turban and Trippi (1996) as:

Bryan Mills 1997 Chapter 4 - Advantages and Disadvantages

• The ability to cope with highly correlated input data (also Multicollinearity

Tucker, 1996).

• A more highly automated input interface is made possible by ANNs ability

to process all inputs at once.

• Fault tolerance - due to the high number of nodes, inaccuracy caused by

bad data can often be localised and not affect the accuracy of the ANN as a

whole.

• Generalisation - noisy, incomplete or previously unseen data will still result

in a reasonable response being made, providing the ANN is suitably trained

(Hawley, Johnson and Raina, 1990).

• Adaptability - Training can occur during the ANN’s in-service lifetime,

allowing the ANN to remain up to date.

Hawley, Johnson and Raina (1990) comment on the fact that ANNs, by the

general purpose nature of their structure, are faster to install and maintain than custom

built KBS. Training of ANNs, though time consuming, need not be as technically

difficult (and therefore as expensive) as writing the program structure of a KBS.

ANNs, due to the way in which the relationship of weights is formed, are not

prone to ‘crashing’ as a result of incomplete or inaccurate data but are often said to

degrade gracefully over time as the weight values alter. This is often cited as an

advantage, but it should be borne in mind that something which has happened

progressively over time may not be noticed until damage has occurred, and without an

adequate control process, may not be appreciated at that point either.

Disadvantages:

Hammerstrom (1993):

• They may fail to produce a satisfactory solution because of insufficient data

page . 47

[for training] or because no learnable (sic) function exists.

• They may produce results from a complex machine learning procedure that

has no straightforward cause and effect origin that can be easily explained.

• They can be slow and expensive to train.

• ANN’s computational speed, in the finished application, depend linearly on

the number of connections and, roughly [approximately], the square of the

number of nodes.

To this list of disadvantages should be added ANN’s most criticised fault -

• ANNs are not capable of demonstrating the logical reasoning behind the

result obtained - the black box approach (Farrar, Tucker and Bugmann,

1997, Hopgood, 1993, Medsker, Turban and Trippi, 1996 and Tucker and

Farrar, 1996).

A problem recognised by Tucker (1996) and Tucker and Farrar (1996) is the relative

‘youth’ of ANNs as a decision making technique. Conventional decision making

techniques have the advantage of many years of testing, both theoretically and in

practice, which have produced models that are both recognised and accepted.

ANNs, by the very fact that they are relatively new and underdeveloped, do

not have the weight of past experience to promote their results. However, as

development within the field is progressing, refining of the technique and empirical

evidence should produce an improved methodology and increased statistical evidence

of accuracy.

Current application of ANNs:

Hawley, Johnson and Raina (1990) provide a comprehensive discussion of

ANN applications in finance from which the following is adapted:

Corporate Finance Applications:


Financial Simulation: Whilst the financial management tasks of a company can be

divided into various smaller and more manageable segments, the complexity of the

company’s internal and external environment is often misrepresented in these

simplified models. ANNs provide a means of linking all segments together during

analysis, they can be tailored to an individual company and are capable of being

dynamic and responsive to change (Donaldson, 1996).

ANNs can be used for credit customer behaviour modelling, planning bad

debt expenses, planning the cyclical expansion and contraction of accounts,

evaluating credit terms and limits, cash management, evaluation of capital

investments, asset and personal risk management (insurance), exchange risk

management, and the prediction of credit cost and availability based on a company’s

financial data. Hawley et al (1990) claim that this area of ANN application offers,

perhaps, the greatest potential for ANN business application.

Prediction: In determining a new policy, direction or product, organisations need to

determine the reaction this choice will create in both present and future investors and

the subsequent effect this will have on their investment decisions. Whilst

conventional decision making techniques are often more cost efficient at solving

problems which have well-identified theoretical underpinning, the problem of investor

behaviour and sentiment is of a complexity more suited to ANNs.

Investors often base their decisions on a wide range of issues and information

concerning both the company and the broader economic environment. It is possible to

train ANNs to mimic the behaviour of investors (using actual investors as training

models) and then determine the effect alterations to company policy and financial

position has on their investment decisions. ANNs offer the opportunity to incorporate

a wide range of input and output information enabling the decision maker to gauge

reaction to change in ways other than alterations in stock price alone (Hawley et al,

page . 49

1990).

Evaluation: Accurately valuing a target company’s net worth before attempting a

acquisition increases the probability of success, both in terms of acquisition outcome

and with regards to eventual profit. ANNs are trained, in this instance, by exposure to

training sets of target company data, as input, and human expert value estimate as

response. This use of ANNs seeks to copy the behaviour of individuals, including the

incorporation of human “hunches” and intuition, which would make the use of

conventional decision support programming difficult or impossible.

ANNs have been used successfully in a wide range of evaluation problems and

confer advantages including: screening large numbers of companies for potential

undervalue or other form of acquisition attractiveness to minimise decision makers

time (which then only need look at “ideal” companies); the ability to copy the

interpretations of a wide range of decision makers; and the ability to automatically

adjust to the decision maker’s changing analytical procedures and selection criteria

over time (Hawley et al, 1990).

Credit Approval: Using a similar training approach to the company evaluation

method, detailed above, ANNs are capable of reducing time and labour by mimicking

the decisions of financial staff in both credit approval and credit limit decisions. In

addition ANNs are able to interpret a wider range of financial statements (providing

they are trained on a wide range) more quickly than their human counterparts,

negating the need for the information to be restated in a standardised form (Jenson,

1992 and Marose, 1990).

Financial Institutions Applications:

Assessing Lending/Bankruptcy Risk: ANNs can provide expert opinion on loans and

lending arrangements to financial institutions in a similar manner to the example of

credit approval discussed above.


Security/Asset Portfolio Management: Due to the unstructured nature of the portfolio

manager’s decision making process and the diversity of information involved ANNs

offer advantages over conventional decision making techniques (Hawley et al, 1990).

Pricing Initial Public Offerings (of ordinary shares): Determining the issue price of

ordinary shares is a complicated process but one which it is essential to optimise

(Brett, 1991). The information is often diverse and of a non-standard format and so

the application of ANNs confers advantages, not found in conventional decision

making techniques, through their ability to generalise and their lack of reliance on an

explicit rule base.

Professional Investors Applications:

Identification of Arbitrage Opportunities: By replicating an expert decision maker’s

reasoning process, a process he or she may not be able to articulate, the ANN is able

to assist in the identification of companies which are about to becoming victims of a

hostile take-over (and thus allowing purchasing of the company’s shares to be

initiated). ANNs offer advantages in their ability to screen large numbers of potential

targets, thus giving the arbitrageur a smaller workload.

The Technical Analyst18: ANNs pattern recognising abilities enable the patterns

(hitherto un-calculable) within stock markets to be emulated. Through this evaluating

ability, more accurate predictions of share price movements can be derived

(Davidson, 1996).

The Fundamental Analyst: Industry norm patterns, market conditions and financial

statements can be used to train ANNs to assist in share purchasing in a way similar to

the technical analyst model.

Summary:
The advantage which ANNs confer over and above more traditional decision

18 Influences on share price which are not related to company trading position.

page . 51

making/support techniques are their ability to discern patterns in large volumes of

data through a process of self-learning as opposed to explicit instruction. This

process enables ANNs to discover patterns or relationships which may have been

overlooked or given too great an emphasis in existing decision support mathematics.

ANN’s ability to identify patterns also enables network recognition of variations in

handwriting, voice, and image recognition, and provides opportunities for a wide

range of security applications.

Currently the use of ANNs in business is predominately within bankruptcy

prediction and financial risk assessment. However ANNs have been used

successfully in a wide range of operations management, marketing (data mining) and

personnel applications.

The use of ANNs has been shown to offer increased accuracy (Farrar, Tucker

and Bugmann, 1997) and, in many instances, are one of the only methods currently

available (e.g. handwriting and voice recognition). The ability to deal more easily

than conventional methods with non-linearity (Waldrop, 1992) gives the user an

advantage in the highly non-linear markets in which business operates (Cuthbertson

and Gripaios, 1996).

ANNs “operate by a logic known only to themselves” (The Economist, 1995).

The most difficult obstacle to overcome in the promotion of ANNs as a decision

making tool is their lack of interoperability.

Bryan Mills 1997 Chapter 5 - Schema

Chapter 5 - Schema for the assessment of the
suitability of ANNs for given problem
Introduction:

Whilst ANNs are an extremely powerful tool, their application is not suited to

every problem. For reasons such as cost, unavailability of data and form of data there

are certain problems which are bettered suited to other forms of decision support. In

order to maximise the benefit gained from ANN application a process of

problem/method pre-selection is required.

To facilitate the matching of problem19 and solution, a schema has been

developed. By answering a series of relatively simple questions, the user is able to

determine whether ANNs are suitable for the problem they are attempting to solve. In

addition to this both the reasons for and against the use of ANNs, for a given problem,

are discussed. Additionally, a suggestion as to other methods which may prove more

suitable should ANNs not be appropriate is given.

Schema:

Diagram 13 (page 46) represents the schema in the form of a flow-chart. By

following the series of questions, labelled 1 to 10, the user is able to determine

whether ANNs are suitable for a given problem and, any problems which may be

encountered in their application. The dotted lines and boxes represent complementary

advice, the solid lines and boxes represent questions, flow and conclusions. In

conjunction to the process the flow-chart is also described, fully, in text form.

The schema is also available as a computer program, the instructions for use

are contained in Appendix 3, and the code lists are contained in Appendix 4.

19 Schema concentrates on the use of ANNs for decision making purposes, full systems for factory control etc. can cost in the
region of £25,000 and would require more detailed analysis than is possible within a general purpose flowchart (Horridge, 1997)

page . 53

This program enables an approach which is more dynamic, multidimensional

and, above all, simplistic for the user, than is possible on paper alone. The program

(called Net Solver) enables the user to determine whether ANNs are suitable for a

given problem. To make this possible the program records the response made by the

user to a series of questions. These responses enable the program to calculate the

suitability of the problem/decision to ANN use. The result is displayed as both a

percentage and as a ‘progress-bar’ of the sort used within the Windows 20 environment

to show elapsed time. In addition to this result the responses are reiterated, to allow

the user to check that she/he has not made any errors. If the user had entered a project

name this will also be displayed. To enable the user to determine the next step in the

application of ANNs advice is given where it is thought appropriate. The user then

has the option of printing the results, advice and details (see Appendix 5 - Sample

Output).

Throughout the program a fictitious company and telephone number is

mentioned (ABC ANNs on (0110)111222), at points where the user may need

additional advice. It is intended that this program could, with further development, be

used in one of the following ways:

• Distributed free of charge (e.g. via the Internet or computer magazine disks) by
a software/consultancy company, replacing ABC ANNs with its own name and
contact number as part of an advertising campaign.
• Incorporated into ANN software as an introduction.
• Sold as consultancy software.
• Used within education as a teaching aid.
• Distributed free of charge, via the Internet, as a philanthropic act.

The program was written using Microsoft Visual Basic Version 4 Professional

(VB4), using the Microsoft Windows 95 operating environment, and a PC equipped

with a 486/66 processor, 8 Mb of Ram and 200 Mb of spare hard disk space (see
20 Microsoft, Windows and Visual Basic are all registered trademarks of the Microsoft Corporation

page . 54

Business Dissertation Thesis

Recommended

Recommended

More Related Content

What's hot

What's hot (18)

Similar to Business Dissertation Thesis

Similar to Business Dissertation Thesis (20)

More from Dr Bryan Mills

More from Dr Bryan Mills (10)

Recently uploaded

Recently uploaded (20)

Business Dissertation Thesis