The Application of Artificial Neural Networks to Business                           Problems                           Bry...
Acknowledgements:        Firstly I would like to thank my supervisor Dr Tucker for his patience and generosityand, in addi...
Bryan Mills 1997                                     Abstract:     Artificial Neural Networks (ANNs) provide a powerful in...
Contents:                                                                  (modified simple)ACKNOWLEDGEMENTS:................
Bryan Mills 1997   Rule Based System:........................................................................................
Appendix 2 - Bayesian Updating:..............................................................................................
Bryan Mills 1997Glossary of terms:Activator function - an equation (mapping function) which describes a neuron’s          ...
sales level could be said to be a function of demandGeneralise - The ability to identify a wide range of objects, patterns...
Bryan Mills 1997Normalisation - A form of data pre-processing which seeks to give all inputs/outputs                   a c...
towards 1 and tending towards 0 (never reaching either 1 or 0) and is               generally given thus:                 ...
Bryan Mills 1997                              Chapter 1 - IntroductionGeneral Introduction          Business involves a co...
maximised overall contribution2. Most decisions, however, are not based on perfectinformation.          This is generally ...
Bryan Mills 1997instance, is the development of these models as a mathematical tool for studyingpatterns and relationships...
data mining (Wiggins, 1994), industrial signal processing (Wiggins, 1994), modellingof traffic flow (Recker, 1995), human ...
Bryan Mills 1997aware of them at all.           The public’s understanding stems mainly from the world of science fictiona...
a pre-programmed set of ‘ideal’ conditions” - explains the process with a reducedlikelihood of confusion, but is not neces...
Bryan Mills 1997will greatly simplify the choice faced by the manager when considering whichmathematical tools to use in b...
Chapter 2 - Discussion of Aims, Methodology                        and Research PhilosophyAim:       This study aims to de...
Bryan Mills 1997years and there are few accessible texts for the non-specialist.           The field of ANN contains possi...
Diagram 1 shows patent activity in the USA for the years 1986-92. It can beseen from the graph that the growth of work wit...
Bryan Mills 1997        The above chart (Diagram 2) represents the flow of tasks from development oforiginal synopsis to t...
Chapter 3 - Explanation of the Fundamental                           Concepts of ANNsIntroduction:This chapter will seek t...
Bryan Mills 1997                                             Chapter 3 -Fundamental Concepts    “a system within which dat...
an emphases on use as opposed to structure. Most data bases (Microsoft’s Access forexample) are capable of interrogating d...
Bryan Mills 1997                                              Chapter 3 -Fundamental Conceptscomplicated and/or rely on a ...
re-used on both sets until a satisfactory relationship was obtained. Most ANNs havean adjustable degree of tolerance (betw...
Bryan Mills 1997                                                                     Chapter 3 -Fundamental Conceptsuser (...
within the knowledge base. It is both common and desirable that the informationrequired to process the rule is contained e...
Bryan Mills 1997                                       Chapter 3 -Fundamental ConceptsDiagram 4 - Single Neuron Calculatio...
Nodes:         Medsker, Turban and Trippi (1996) comment that most commercial ANNshave between 10 and 1,000 nodes arranged...
Bryan Mills 1997                                                   Chapter 3 -Fundamental ConceptsDiagram 6 - Screen dump ...
Weights and bias terms:       Once the data is entered into the network its connection from input layer tocalculation node...
Bryan Mills 1997                                         Chapter 3 -Fundamental Concepts        The significant points in ...
respectively. Mapping functions are shown mathematically as F:x→y. The non-linear boundary can be shown by the simplified ...
Bryan Mills 1997                                            Chapter 3 -Fundamental ConceptsTrippi, 1996). It is suggested ...
non-numeric data being analysed provided it can be converted, with consistency, intonumbers.      For example, risk is a c...
Bryan Mills 1997                                            Chapter 3 -Fundamental Concepts2256                       1-5 ...
Natural log regression               Using logarithms to convert data into small units.Ratio splitting                    ...
Bryan Mills 1997                                                                     Chapter 3 -Fundamental Conceptsthe on...
opposed to calculating the level of error, the ANN is merely informed that it is eitherwrong or right and continues to adj...
Bryan Mills 1997                                        Chapter 3 -Fundamental ConceptsDiagram 10 - The Multilayer Percept...
programmers decision given a certain problem. It is common to start at a low numberof layers/nodes and then increase this ...
Bryan Mills 1997                                                 Chapter 3 -Fundamental Conceptswill create a replica of p...
adjustment of mathematical weights is a reasonable approximation of the process andallows the internal computations and st...
Diagram 12 - The operation of ANNs - flow diagram
Chapter 4 - Investigation into advantages,disadvantages and current application of ANNs.Introduction:       In order to al...
Bryan Mills 1997                                   Chapter 4 - Advantages and Disadvantages        • The ability to cope w...
[for training] or because no learnable (sic) function exists.       • They may produce results from a complex machine lear...
Bryan Mills 1997                                  Chapter 4 - Advantages and DisadvantagesFinancial Simulation: Whilst the...
1990).Evaluation: Accurately valuing a target company’s net worth before attempting aacquisition increases the probability...
Bryan Mills 1997                                                          Chapter 4 - Advantages and DisadvantagesSecurity...
making/support techniques are their ability to discern patterns in large volumes ofdata through a process of self-learning...
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Business Dissertation Thesis
Upcoming SlideShare
Loading in …5
×

Business Dissertation Thesis

3,146 views

Published on

Old now but may be of interest. Dissertation on Artifical Neural Neywroks and their application to business

Published in: Business
1 Comment
2 Likes
Statistics
Notes
  • Check the source, This site is really helped me out gave me relief from essay headaches. DigitalEssay.net Good luck!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
3,146
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
50
Comments
1
Likes
2
Embeds 0
No embeds

No notes for slide

Business Dissertation Thesis

  1. 1. The Application of Artificial Neural Networks to Business Problems Bryan Mills University of Plymouth Business School (Franchised to Cornwall College)Honours project submitted as partial fulfilment for the degree of BA Honours in Business Administration Supervisor: Dr Jon Tucker 14th May 1997
  2. 2. Acknowledgements: Firstly I would like to thank my supervisor Dr Tucker for his patience and generosityand, in addition, to acknowledge the contribution he has made to this dissertation. I wouldalso like to thank Dave Ager, Jill Ferret and Mike Trennary for their tolerance andencouragement during both the dissertation and the degree programme. I would like toacknowledge the encouragement I have received throughout the degree from Buzz Banks,Helen Cobbin and Ken Waller. Also, I would like to take the opportunity to thank PaulIngram for his uncompromising and contagious obsession with academia and Tony Butt forfirst introducing me to Chaos Theory and non-linearity. 2
  3. 3. Bryan Mills 1997 Abstract: Artificial Neural Networks (ANNs) provide a powerful information technologybased tool for decision making purposes. However, present literature on the subjectis often found to be either inaccessible or of limited relevance to (general) businessapplication. In this report ANNs are described in a more intuitive manner than foundwithin much of the existing literature. Emphasis is placed upon the use of ANNswithin the business environment, although the study still provides an introduction forwider application. Misconceptions surrounding ANNs, and Artificial Intelligence ingeneral, are explored and recommendations are made with a view to their resolution.The advantages and disadvantages of ANNs are discussed and present applicationsare listed with a view to demonstrating the various application possibilities of ANNs.To enable wider application of ANNs within business, and to reduce misguidedapplication, a schema has been developed. This schema, which has been developedas both a flowchart and a computer program, allows the potential ANN user tocritically appraise the use of ANNs for a given decision making problem. page 3
  4. 4. Contents: (modified simple)ACKNOWLEDGEMENTS:.......................................................................................................................ABSTRACT:...............................................................................................................................................LIST OF DIAGRAMS: (BUILD FROM TABLE OF FIGURES STYLE - DELETE LIST OFTABLES FIRST).........................................................................................................................................LIST OF TABLES: (BUILD FROM TABLES STYLE)............................................................................GLOSSARY OF TERMS:..........................................................................................................................CHAPTER 1 - INTRODUCTION.............................................................................................................. General Introduction............................................................................................................................... Popular Misconceptions Concerning Neural Networks:........................................................................Chapter 2 - Discussion of Aims, Methodology and Research Philosophy............................................ Aim:......................................................................................................................................................... Objectives:............................................................................................................................................... Benefit of the project to industry and commerce:................................................................................... The growth of research in the neural area:............................................................................................ Methodology and Approach:................................................................................................................... Schema development:..............................................................................................................................Chapter 3 - Explanation of the Fundamental Concepts of ANNs......................................................... Introduction:............................................................................................................................................ An outline explanation of the fundamental concepts of Artificial Neural Networks:.............................. Knowledge Based Systems:..................................................................................................................... The difference between Artificial Neural Networks and Conventional Knowledge Based Systems:...... Explanation of the operation of ANNs:................................................................................................... First Principles:....................................................................................................................................... Knowledge Based Systems:................................................................................................................ 4
  5. 5. Bryan Mills 1997 Rule Based System:.................................................................................................................................. Artificial Neural Networks:..................................................................................................................... Overview:............................................................................................................................................ Components:........................................................................................................................................ Nodes:.................................................................................................................................................. Weights and bias terms:....................................................................................................................... Generalisation:..................................................................................................................................... Choice of mapping or activator function:............................................................................................ Data pre-processing:............................................................................................................................ Training:.............................................................................................................................................. Topology:............................................................................................................................................ The Multilayer Perceptron - an example of supervised learning/training:............................................ The Kohonen self organising net- an example of unsupervised learning/training:................................ Summary:.................................................................................................................................................Chapter 4 - Investigation into advantages, disadvantages and current application of ANNs........... Introduction:............................................................................................................................................ Advantages and disadvantages:.............................................................................................................. Current application of ANNs:................................................................................................................. Summary:.................................................................................................................................................Chapter 5 - Schema for the assessment of the suitability of ANNs for given problem....................... Introduction:............................................................................................................................................ Schema:................................................................................................................................................... Explanation of Schema:........................................................................................................................... Summary:.................................................................................................................................................Chapter 6 - Conclusions and Recommendations.................................................................................... Conclusion:.............................................................................................................................................. Limitations:.............................................................................................................................................. Further Research.....................................................................................................................................Appendix 1 - Example of Training Process............................................................................................. page 5
  6. 6. Appendix 2 - Bayesian Updating:............................................................................................................Appendix 3 - Instructions for Running The Computer Program:........................................................Appendix 4 - Computer Code-list............................................................................................................Appendix 5 - Sample Output:...................................................................................................................Appendix 6 - Visual Basic as a Programming Language:.....................................................................Bibliography:............................................................................................................................................. List of Diagrams: (build from table of figures style - delete list of tables first)Diagram 1 - Patent Activity..................................................................................................................................................................Diagram 2 - Methodology.....................................................................................................................................................................Diagram 3 - Knowledge Based System.................................................................................................................................................Diagram 4 - Single Neuron Calculation...............................................................................................................................................Diagram 5 - Representation of a Neuron.............................................................................................................................................Diagram 6 - Screen dump of a text file for use in WinNN....................................................................................................................Diagram 7 - Class Membership............................................................................................................................................................Diagram 8 - Universe of objects...........................................................................................................................................................Diagram 9 - Sigmoid Function.............................................................................................................................................................Diagram 10 - The Multilayer Perceptron.............................................................................................................................................Diagram 11 - Kohonen Self Organising Feature Map.........................................................................................................................Diagram 12 - The operation of ANNs - flow diagram..........................................................................................................................Diagram 13 - Schema........................................................................................................................................................................... List of Tables: (build from tables style)Table 1 - Sample Problem...........................................................................................................................Table 2 - Simplified weight method.............................................................................................................Table 5 - Input file explanation...................................................................................................................Table 4 - Sigmoid values.............................................................................................................................Table 5 - Data pre-processing..................................................................................................................... 6
  7. 7. Bryan Mills 1997Glossary of terms:Activator function - an equation (mapping function) which describes a neuron’s internal state as the total of its inputs; net = ∑xiwi-θ, where x is an input w is a weight.Algorithm - a procedure or series of steps used to solve a problemAutoassociative - mapping the original pattern from noisy or incomplete dataBackpropagation - an algorithm which compares results with expected answers and then passes the difference back through the network to facilitate weight adjustment.Bias term - A systematic error (θ) introduced to each node independently to allow control over the otherwise independent node output.Cell - A neuronDatabase - In this instance, a set of facts (data) stored within a computer systemDependant variable - A variable which will be altered or created by the change in value of an independent variable(s). Normally shown on the left hand side (LHS) of an equation..EPOS - Electronic point of sale - the computer connection between cash-tills and the central computer within a retail storeEPS - Earnings per share (accountancy measure)Front-end subsystem - A computer program designed to simplify (humanise) the input and output of dataFuzzy - A set whose members belong to it to some degree. In contrast a standard set contains its members either all or none (Kosko, 1993).Function - A rule which maps one set element onto a different element in another set, page 7
  8. 8. sales level could be said to be a function of demandGeneralise - The ability to identify a wide range of objects, patterns etc. from a minimal set of key descriptive dataHeteroassociative - mapping input pattern set to different output pattern setHyperplane - A plot involving more than 3 dimensions and therefore difficult to represent graphicallyIndependent variable - A variable which will alter or create the change in value of a dependent variable(s). Normally shown on the right hand side (RHS) of an equationInference engine - The part of a knowledge based system’s programming which deduces results from given facts/dataKnowledge based system - The separation of data and control (algorithms) allowing the computer to respond to a series of differing inputs by calling on a library of information (knowledge base) as opposed to altering variables contained explicitly within the program’s structure.Mapping function - A rule linking the elements of one set to those of another; usually shown as F:x→y; the function which maps the x onto y.Multivariable - Containing a large number of independent variablesNetwork - A collection of interconnected nodes forming a topologyNeuron - A single activator function, a processing element, a mapping function through which variables must pass, a calculation pointNodes - NeuronsNon-linearity - Equations containing powers, roots, trigonometric or logarithmic functions. 8
  9. 9. Bryan Mills 1997Normalisation - A form of data pre-processing which seeks to give all inputs/outputs a commonality by constraining their values to within a pre- determined rangePre-processing - Alterations to data before use (normalisation, removal of outliers, ratio splitting). Usually conducted with the intention of increasing the networks efficiency or conversion of non-numeric data to numeric.Propositional logic/calculus - A step by step inference system for determining whether a given proposition is true or false. There are various forms of propositional logic (modus ponens, modus tollens, denial of antecedent etc. ), but all are based on a deviations of: If x is true then y must be true/false, If and only if x is true then y is true/false etc. (Eysenck and Keane, 1995).Ratio splitting - Using the component parts of a ratio separately as opposed to using the result (GPMargin = GP/Sales; use GP and Sales as input not GP Margin)Real-time - The collection and processing of data as events occur as opposed to the use of historic data. EPOS works in real-timeROCE - Return on capital employed (accountancy measure)Set - A collection of elements defined by a rule which makes them separable from other sets - e.g. men and women are two separate sets (separated by sex) but are also within the common set of humans (separated from other animal forms by species)Sigmoid - A common ANN Activator function. An equation which has the effect of reducing all independent variables to an answer of between tending page 9
  10. 10. towards 1 and tending towards 0 (never reaching either 1 or 0) and is generally given thus: 1 fnonlinear ( x ) = 1 + e−x where x is summed input and e is the mathematical constant that is the base of natural logarithms (2.71828.....)Topology - In this instance an attempt to graphically represent the interconnection of nodes within the network. Topology is often one of the key distinguishing features separating different ANNs (others being training method and activator function)Training method - As ANNs self learn by exposure to data it is necessary to have an algorithm which allows the ANN to distinguish between correct and incorrect responses. This may either be supervised (told when incorrect and what should have been the output), unsupervised (self learning pattern recognition) or reinforced (told simply if correct or incorrect)Training set - A collection of data used to train the ANN, usually separated into a training set and a hold out or test setVector - A quantity which has both magnitude and direction. ANN’s input consists of a one dimensional array of differing x values of the form x1w1+x2w2+x3w3+...xnwn, where x indicates input and w indicates weightWeights - A value which is altered by the ANN to enable the emphasis of the variable upon which it acts to be either strengthen or weakened. A variable coefficient which determines strength of an input’s effect on output 10
  11. 11. Bryan Mills 1997 Chapter 1 - IntroductionGeneral Introduction Business involves a complex mix of people, policy and technology, and existswithin the constraints of economics and society (Clifton and Sutcliffe, 1994). It isoften the precise way in which these items are mixed that can create either success orfailure for an organisation. This presents the manager with two key tasks; theefficient collection and analysis of all relevant information. From this analysis themanager will be able to formulate strategies, define objectives and implement plansfor there fruition. The provision and analysis of information, within business, is oftenreferred to as the decision support process and the methodology adopted referred to asdecision support systems (DSS). Business decisions can often be viewed as the solution of variousmathematical problems. Whether it be determining the price level of a product, thebenefit of expansion into a new market, staff levels or the probability of a projectfailing mathematics usually plays a role. In fact, due to the overriding objective of“maximising shareholders wealth (McLaney, 1994)” found within all profit makingorganisations, it can be said that, as wealth/profit is measured numerically, it wouldbe difficult, if not impossible, to view the organisation meaningfully in any otherway1. One of the key problems in any decision is the availability and cost of“perfect information”. Given perfect information (all the facts concerning a decisionwith complete confidence in these predictions being correct) there would be little forthe business manager to decide, it would simply be a choice of the project which1 Non-profit making organisation seek cost efficiency - another mathematical measure page 11
  12. 12. maximised overall contribution2. Most decisions, however, are not based on perfectinformation. This is generally due to a combination of the prohibitive cost ofgathering such information, the availability of information and the intrinsicunpredictability and complexity of the markets in which business operates. Ongoing developments in the field of Information Technology has enabledthe gathering, storage and retrieval of much larger quantities of information than waspreviously possible. Stock Markets can be observed in real-time, supermarkets knowthe exact quantities of goods on their shelves (via Electronic Point of Sale (EPOS))and their customers weekly shopping lists (via Loyalty Cards), companies canmeasure the exact output of machines on the shop floor (via Computer AidedManufacturing). This information is, however, worth only as much as the gainderived from its ownership. To be able to quote a share price or stock level is fine,but the information has already become historic. What is required in decision makingis a means by which to identify patterns and trends in the large volumes of datacurrently available, and to increase the confidence in the predictability of this data toan acceptable level. The capabilities Artificial Neural Network (ANN) models have in recognisingpatterns and trends in large volumes of data has meant that they are beingincreasingly used for a variety of industrial/commercial applications. ANNs are a form of computer software which took their original inspiration(McCulloch and Pitts, 1943) from mans limited understanding of the workings of thehuman brain. Research has been carried out in this area for two broad reasons. Thefirst and original was an attempt to model the human brain electronically to developa greater understanding of its operation. The second, and most relevant in this2 Overall Contribution - the manager would consider the organisation’s other ventures, market share, market growth and longterm survival in his/her decision 12
  13. 13. Bryan Mills 1997instance, is the development of these models as a mathematical tool for studyingpatterns and relationships in data. The mathematics which form these models are particularly useful whendealing with non-linear problems, problems which cannot be graphed by use of astraight line, of which there are numerous examples in business (demand/price,production level/cost, share price/ROCE/EPS - an increase in the independentvariable (price) does not guarantee a proportionate increase in the dependant(demand)). ANNs are also capable of dealing with dependant variables which mayhave several variables acting on them (e.g. interest rates, inflation and estimation ofrisk - in cost of capital calculation), the relationship between each being boththeoretically appreciated and explainable but not easily converted into an equation oralgorithm (Klimasauskas, 1991 and Scocken, 1994). It is the ability to deal with non-linearity, multivariables and large volumes ofdata which gives ANNs what is perhaps their most impressive features - patternrecognition and self learning. ANNs receive their information (their knowledge) via aprocess of training. Sets of data and desired results are passed through the networkuntil the computer is able to create, to a reasonable degree of accuracy, the desiredresult. This is made possible by the networks ability to generalise the training datapresented to it and form an output, given new inputs, based on this generalisation.Once this training stage is complete a problem (independent variables) can be inputand a result (dependent variable) is generated. Current application of ANNs includes, amongst others; stock and moneymarket forecasting (Trippi and Turban,1996), face and handwriting recognition(Rogers, Kabrisky, Ruck and Oxely, 1994), recognising whether station platforms arebusy or not, missile direction systems, voice recognition, voice control of computers, page 13
  14. 14. data mining (Wiggins, 1994), industrial signal processing (Wiggins, 1994), modellingof traffic flow (Recker, 1995), human resource management (redundancy selection)(Coit, 1996), new product feasibility studies (Madu, 1995), risk evaluation, chemicalanalysis, weather forecasting and resource management (Davalo, Naïm, 1991), acomplement to business decision support systems (Scocken, 1994), operations qualitycontrol (Horridge, 1997) and the processing of marketing data.Popular Misconceptions Concerning Neural Networks: The subject of Artificial Neural Networks (ANNs) is an example of a namenot being self explanatory. The description ‘Artificial Neural Network’ is amisnomer, it suggests an artificial representation of the human mind (it beingcomposed of a network of neurons). Exciting though the creation of an ‘artificialmind’ would be, the ANNs currently in operation are little more than computerprograms capable of doing clever ‘sums’. The cleverness of these ‘sums’, however,is not to be taken lightly. Systems have been developed which are able to identifypatterns in very large samples of data, produce a method of calculating relationshipsbetween data where conventional mathematics would have been inadequate forpractical application, and represent a very strong possibility of development ofsystems better suited to understanding our own fuzzy3 world. As a subject, ANNs are fairly inaccessible and fraught with misconceptions.The subject is clouded by two separate, but interrelated explanations, and thisdifficulty is further compounded by the absence of accessible knowledge. On the onehand there are the works of various academics and academic institutions. On theother is the general public’s4 understanding of what ANNs represent - if they are3 Fuzzy - e.g. Language - hot, warm, cold mean different temperatures to different people and the boundary between hot andwarm (for example) is not clear (is 18 degrees warm, 19 degrees hot and just as hot as 28 degrees?)4 Used here simply to describe those outside of the fields of Mathematics, Computing and Psychology - not intended to be inany way derogatory. 14
  15. 15. Bryan Mills 1997aware of them at all. The public’s understanding stems mainly from the world of science fictionand ‘popular’ science programmes. It is a world of Arthur C Clarke’s HAL (2001etc.), Philip K. Dick’s Bladerunner5 - thinking machines which inevitably turn ontheir creators, with devastating results6. This understanding is not assisted by theanthropomorphic nature of the language surrounding ANNs and the willingness ofsome academic’s to emphasise this definition (for example - Professor Aleksander,Imperial College London - “Magnus [a computer program] has a mind of its own” -(Millar7, 1996)). The use of words such as ‘thinking’, ’neuron’ and ‘understanding’all point towards machines which, eventual, may replicate the human thought processto the point of being conscious. The reality of the situation is quite different, atpresent computers can represent little more than a few thousand neurons, compared to10,000 in a Cockroach’s brain and 100 billion in a humans (The Economist, 1995). The academic world often uses anthropomorphic terms to overcome some ofthe limitations of language and the mathematical nature of more correct descriptions.For example in the development of a computer system to control the heating, lightingand ventilation of an office building one may be tempted to use expressions such as-“to develop a system which is aware of its environment”. However, use of the word‘aware’ may suggest consciousness and use of ‘its environment’, as opposed ‘theenvironment in which it operates’ could suggest ownership and, therefore, existencebeyond being an object. The difficulty stems from the absence of a more correct, andequally as convenient, shorthand. The alternative - “to develop a system whichconstantly monitors the surrounding environment and compares this information with5 More correctly - the original book was called- Do Androids Dream of electric Sheep6 The defence analyst and writer Warwick Collins has gone so far as to call on the government to restrict the human attributesscientists can give programmes/machines (Millar 1996, The Guardian Newspaper, (17/12/96) page 4, eighth paragraph)).7 The Guardian Newspaper, 17/12/96 page 4, second paragraph) page 15
  16. 16. a pre-programmed set of ‘ideal’ conditions” - explains the process with a reducedlikelihood of confusion, but is not necessarily more accurate. The readers frame ofreference provides the key to which language would be more appropriate. The use of such terminology creates few problems within the field because thelevel of understanding is such that the words used often have two separate meanings -the computer related meaning and the human related meaning - for example:Neuron - • Human related meaning - a cell which responds to various inputs by producing responses - a processing unit. • Computer related meaning - a part of ANN computer program which performs a calculation - a processing unit.The definitions are similar and would appear to suggest that, if a significant numberof ‘computer neurons’ were assembled, a human brain could be replicated. Whilstthis formed the inspiration behind some of the early research in the field (for exampleRosenbalt 1958, 1961), modern theory points to a level of complication within thehuman brain which makes the early optimism seem naive at best. A more comprehensive discussion on matters of human and machineconsciousness is found in Penrose, 1988, Emperors New Mind, and 1994, Shadowsof The Mind. This thesis is intended to explain Artificial Neural Networks in such a way asto reduce some of the confusion which often surrounds the topic. In addition it isintended to simplify the application of ANNs (to a given problem) by thedevelopment of a schema (both paper based and as computer program). This schema 16
  17. 17. Bryan Mills 1997will greatly simplify the choice faced by the manager when considering whichmathematical tools to use in both decision, classification and control problems. Toenable the full value of this schema to be realised the thesis begins with acomprehensive review and simplification of existing literature. As previouslydiscussed the confusion stems from three broad areas - media hype, anthropomorphicdescriptions and texts aimed at a specialist reader (scientific) and it is intended thatthis thesis will contribute towards redressing this balance. page 17
  18. 18. Chapter 2 - Discussion of Aims, Methodology and Research PhilosophyAim: This study aims to develop a level of understanding from which the businessmanager (who is unlikely to be an IT specialist) can establish the relativemerits/demerits of the ANN technique for business decision support analysis. The project aims to make inroads into some of the more accessible academictexts with a view to creating a more intuitive guide to ANN use aimed at the businessmanager and student. To aid this explanation a schema or system will be developedwhereby the reader can assess the suitability of ANNs for a problem they wish tosolve. To assist in the discussion on the suitability of ANNs for given problems therewill also be an assessment of current uses and the advantages and disadvantages thatapplication presents.Objectives:1) To conduct a literature review of the fundamental concepts underling ANNs.2) To examine the existing use of ANNs.3) To develop a system to enable problems to be assessed for the suitability of ANN application.Benefit of the project to industry and commerce: Progress in the development of ANNs is closely tied to the development ofcomputer equipment. It is only within the past 5 years that computing power hasbecome cheap enough to make ANN use a viable possibility. However ANNs haveremained in the exclusive domain of the scientist and mathematician for the past 45 18
  19. 19. Bryan Mills 1997years and there are few accessible texts for the non-specialist. The field of ANN contains possible solutions to business problems not fullyaddressed by present mathematical techniques (Tucker, 1997). As Gleick (1993) andWaldrop (1992) have commented, non-linearity of patterns are rife in the enormousvolumes of information produced by industry and commerce (e.g. the financial pages,actuary data, market research responses). ANNs enable the user to analyse this datamore accurately than traditional problem solving techniques, making them acommercial advantage to many industrial sectors.The growth of research in the neural area: The field of ANNs is expanding at an amazing rate. The expansion of thesubject is closely linked to technological developments in the IT field. As this areacontinues to develop8 there will be an increasing expansion of opportunities in thefield of ANNs (Medsker, Turban and Trippi, 1996). Funding of research within thefield of ANNs is continuing with the Japanese government having budgeted $250million over next 10 years, and the US government having pledged research fundingof $400 million over next 6 years (The Economist, 1995). Patents Registered USA 300 250 200 Combined Number 150 Comp. Int ANN 100 50 0 1986 1987 1988 1989 1990 1991 1992 YearDiagram 1 - Patent Activity8 Moores Law suggests a doubling of the number of chips on a transistor every 18 to 24 months (J. Scholfeild, 1996, TheGuardian Newspaper (31/10/96) page 3 Online Section). page 19
  20. 20. Diagram 1 shows patent activity in the USA for the years 1986-92. It can beseen from the graph that the growth of work within this field is almost exponential. Itis also important to note that the full extent of ANN’s application within business(particularly finance) has yet to be realised (Farrar, Tucker and Bugmann, 1997).Methodology and Approach: The project is based mainly on a comprehensive literature survey and reviewof texts within the field of ANNs. The literature search was conducted in the firstinstance to develop a clear understanding of the subject. From this, a succinctexplanation of the concepts underpinning artificial neural networks, aimed at businessmanagers, has been produced. The greater understanding engendered by the literatureresearch provides the basis for an analysis of the advantages and disadvantages of theuse of ANNs and forms the foundation of the schema development.Diagram 2 - Methodology 20
  21. 21. Bryan Mills 1997 The above chart (Diagram 2) represents the flow of tasks from development oforiginal synopsis to the conclusions and recommendations.Schema development: The schema, which forms the most pragmatic part of the thesis, wasdeveloped from the literature research. The schema seeks to answer the question“Do ANNs offer a realistic solution to a given problem”. As ANNs are capable ofdealing with a variety of problems, and as the business community usually has avariety of different problems under review, it is intended that the schema will begeneral in its approach, whilst maintaining effectiveness and accuracy. The schema is developed both as a flow-chart and as a computer program.By establishing the specific data and training requirements of ANNs it is possible toconstruct a series of questions of a non-technical nature, which the manager canconsider concerning the problem under review. The schema follows the flow of theresponses and culminates in a suggestion for further action. The reasons for thesuggested actions are explained, allowing the manager to consider various courses ofaction depending on the resources available to him or her. Where appropriate theschema will suggest alternative decision making techniques which could prove morecost efficient or accurate than the use of ANNs. page 21
  22. 22. Chapter 3 - Explanation of the Fundamental Concepts of ANNsIntroduction:This chapter will seek to place ANN in the broader context of computer software. Ahighly simplified description of the workings of ANNs will follow. Once this basicunderstanding has been enabled a more detailed explanation will follow, which isintended to equip the reader with a reasonable level of knowledge on the topic, toenable both further study or practical application.An outline explanation of the fundamental concepts of Artificial Neural Networks: An ANN is simply a computer program which, through the adjustment ofmathematical weights, is able to create a model capable of producing results (usuallyin the form 1 or 0, or scaled using decimals from 0 to 1) , for a given set of numericinput data, to a reasonable degree of accuracy. The network will often include Front-end subsystem (Attrasoft User’s Guide and Reference Manual, 1996) to enable bothdata encoding and data decoding: Data encoding: to convert user-application data to neural input data. Data decoding: to convert neural output data back to user-application data. ANNs can be considered as part of the larger group of computer basedtechniques referred to as Knowledge Based Systems.Knowledge Based Systems: There are numerous forms of computer systems which fall under the generalheading of Knowledge Based Systems (KBS). This use of computing power can bedefined as: page 22
  23. 23. Bryan Mills 1997 Chapter 3 -Fundamental Concepts “a system within which data is analysed by comparison with sets of pre-obtained data by following specific rules and/or weighted relationships” (author)To facilitate this comparison the system will require a set (library, files, historicrecords) of knowledge. This knowledge is the basis upon which the system operatesand can take numerous forms: • Financial Data - credit limit, accounting ratios, past sales figures • Human Resource Data - qualifications, age, experience (years) • Operational Data - machine failures (frequency), tolerances, re-order levels As can be seen from the above examples, the knowledge base is often a formof database of the sort now commonly found within most organisations. Thedifference between KBS and conventional databases is the level of interrogation andcontrol which is placed within the systems remit. As opposed to merely storing datathe system will be called upon to ‘trawl’ through the data to identify trends andpatterns of behaviour or it may use its knowledge to instigate some form of action.For example if a bill became overdue the system could issue a reminder without theneed for an operator to intervene. This is possible because the system knows the date,the date the last payment was made, the difference between this date and today’s andthe company’s policy on ‘debtor days’. This example also indicates the level ofunderstanding possible - knowledge, in this instance, can in no way be said to be inthe same sense as a human would know what it was to have an overdue bill. It becomes apparent that many modern databases are capable of achievingsimilar results to knowledge based systems. The difference between the conventional 9knowledge based system and databases is becoming increasingly subtle and is more9 Conventional as opposed to ANNs page 23
  24. 24. an emphases on use as opposed to structure. Most data bases (Microsoft’s Access forexample) are capable of interrogating data and also of issuing notification should thisbe required.The difference between Artificial Neural Networks and Conventional KnowledgeBased Systems: As previously discussed ANNs are part of the broad heading of KBS,however it is important to recognise that there are fundamental differences betweenANNs and other KBSs. Whilst a KBS has the rules and relationships concerning itsknowledge programmed into the system (albeit kept separate from the knowledge)ANNs develop their ‘own’ rules and relationships through a process of self learning.The self-learning abilities of ANNs are most simply explained by example:Suppose the relationship between the following set of data was desired:Advertising 100 150 50 10 200Spend £’sSales £’s 300 450 150 30 600Table 1 - Sample ProblemFrom the above table, by dividing sales by advertising spend (or by drawing a graph),it is quite possible to estimate that sales are three times advertising spend. It ispossible to estimate this figure because, a) we appreciate and could prove arelationship between the variables, b) there are relatively few variables which c)enables a simplistic approach to the formulation of a equation (relationship). It canbe appreciated that a more complex relationship may exist, which is beyond thesimplistic approach used so far. To solve a multivariable and non-linear10relationship would require the use of statistical techniques which are often10 Non-linear - a relationship which would create a the graph of a curve as opposed to a straight line, the equation of whichwould contain powers x2etc.. page 24
  25. 25. Bryan Mills 1997 Chapter 3 -Fundamental Conceptscomplicated and/or rely on a degree of approximation. ANNs take a different route to establishing the relationship between variables- by adjusting the values of numerical weights within a equation (function). Theweights will act upon the data to alter its value with the intention of producing thedesired result. To enable this process to take place the system must be exposed to thedata a set at a time (e.g. Advertising Spend of £100 and sales of £300 is the first setof data in Table 1). The computer will, in the first instance, apply a guess as to thevalue of the weights to be used (although this starting value may well be pre-programmed or random (Hopgood, 1993)). This ‘guess’ will, inevitably, prove to bewrong and the system will alter the weights and retry.The first set of data will be treated as below:Advertising Spend £’s Weight Function Result Desired Result £’s100 1 Spend * Weight 100 300 2 Spend * Weight 200 300 3 Spend * Weight 300 300Table 2 - Simplified weight methodIt can be seen from the above that after a series of iterative steps the system was ableto produce the desired result, and in our previous example this weight would beacceptable for all of the data sets. The function used in the above example is linear as opposed to the non-linearfunctions used within ANNs, also the number and relationship of the variables ismore simplistic than would normally be encountered (for an example of the morecomplicated OR problem see Appendix 1). It is possible to imagine that if the relationship was more complicated and ourweight of 3 proved unsuitable for the next data set it could be adjusted again and then page 25
  26. 26. re-used on both sets until a satisfactory relationship was obtained. Most ANNs havean adjustable degree of tolerance (between ANN output and training set’s expectedresult), for example WinNN has adjustable target error to determine the acceptableRoot Mean Square error11, once target and RMS match training of that net is said tobe complete - note; the lower the acceptable error the more refined, and lessgeneralised the net becomes. The procedure described in this simplified model could be said to represent asingle neuron (processing unit, cell). To enable more complicated relationships to bedeveloped ANNs have more than one neuron and it is not uncommon for the resultsof one neuron to be the input of another. If these connections were viewed pictoriallythey would form a network of interconnected neurons, and hence; Artificial (nonhuman) Neural (processing units) Network (interconnection of neurons).Explanation of the operation of ANNs:First Principles:Knowledge Based Systems: As discussed in the introduction, ANNs are a form of software that has theability to self learn. Unlike more conventional (rule based) forms of knowledge-based systems the algorithms used to enable the inference engine (rule interpreter) towork are not hard programmed or explicit rules based along the IF...THEN...ELSEpattern. Instead the program uses a series of mathematical weights to establish datarelationships. To enable an understanding of the difference it is first necessary toexplain the basic components within knowledge based systems. Knowledge based systems contain 3 core components. An interface with the11 RMS - the square root of the mean of a set of squared numbers page 26
  27. 27. Bryan Mills 1997 Chapter 3 -Fundamental Conceptsuser (outside world) to enable both data input (keyboard, sensors, etc.) and output(monitor, servos, printout, etc.), a knowledge base (data base) and an inferenceengine (rule interpreter, instructions, ‘main program’). There are two othercomponents often found within knowledge based systems; an explanation module12 toenable the reasoning behind the decision made to be shown, and a knowledgeacquisition module to enable the knowledge base to be built by use of one or more ofthe acquisition techniques possible (Hopgood, 1993). Diagram 3 illustrates therelationship between these components:Diagram 3 - Knowledge Based SystemAs shown in diagram 3 the relationship between the components within a KBS isrelatively straightforward. Information is gathered from the outside world, storedwithin a data base and, upon a query being made, accessed to provide an answer.Rule Based System: A rule based system is based, fundamentally, on the IF...THEN...ELSEstructure (propositional logic/calculus). The following illustrates this point:IF credit level is greater than pre-agreed limitTHEN stop credit and issue reminderELSE do nothingWhere the credit level is computed from inputs and the pre-agreed limit is contained12 ANN have great difficulty in satisfying this requirement and your attention is drawn to the discussion in Chapter 4 page 27
  28. 28. within the knowledge base. It is both common and desirable that the informationrequired to process the rule is contained explicitly within the knowledge base asopposed to implicitly within the program to enable a more simplistic and robustmethod of updating to be used (e.g. as opposed to altering the program’s source codeentries in a data base are changed)(Hopgood, 1993). Whilst it is can be appreciated that this is a simplistic view of the workings ofa rule based system further developments serve only to improve and compound thisbasic methodology (see for example; Appendix 2 - Bayesian Updating).Artificial Neural Networks:Overview: The key difference between ANNs and KBS lies with the inference engine.As opposed to having a logic imposed on it, the network is allowed to develop itsown logic by means of training, either supervised or unsupervised. Weights are usedto determine the strength of relationships and there is no IF...THEN...ELSE. Insteadthe network decides the relevance of inputs and their interconnections based on itsown experience (e.g. it has been trained). The network consists of a selection of nodes or cells arranged structurally in apredetermined topology. The nodes are grouped in layers. This takes the form of aninput layer, one or more hidden layers and an output layer. Each node acceptsvarious inputs, adjusts them via weights, adds all inputs together them, uses them tocalculate a non-linear function, outputs them for passing to another cell, or if last celluses the output layer to compare the result with the expected answer and then passesthe difference back through the network to allow weight adjustment to correct errors(backpropagation). A simplified single neuron calculation appears thus: page 28
  29. 29. Bryan Mills 1997 Chapter 3 -Fundamental ConceptsDiagram 4 - Single Neuron CalculationPictorially this can be represented thus:Diagram 5 - Representation of a NeuronThis processes is explained, in detail, below and would normally be performed bynumerous neurons/cells/nodes within one or more layers at the same time e.g. inparallel.Components: It is important to appreciate that ANNs gain their ability not from apredetermined layout or selection of weights but from the networks ability to adjustweights and alter (strengthen/weaken) connections between nodes. Beforeattempting to explain the mathematics behind these interconnections an explanationof the key components of the network is required. page 29
  30. 30. Nodes: Medsker, Turban and Trippi (1996) comment that most commercial ANNshave between 10 and 1,000 nodes arranged in three layers, and that although 4,5 ormore layers is not unheard of, it is not deemed necessary for business applications Hopgood (1993) describes a node’s role as “to sum each of its inputs, subtracta bias term, θ, and pass the result through a non-linear function, fnon-linear, known asthe activation function”. Hopgood’s emphasis on the bias term is discussed below.ANNs have sets of these calculating functions and a description is given by Patterson(1996) as “Every ANN is composed of a set n of simple neural computing elements(neurons, units, processing elements or PEs, cells)” and where this set of cells can begiven as:C={ci } i=1,2,...,n.Patterson goes on to comment that cells can be grouped into three distinct categories;input, hidden (or interior) and output. The interior layer of cells are the nodes which perform the majority of thecalculation process and are discussed under various headings below (Weights andBias Terms, Generalisation, Choice of Function). Input cells are the cells which takethe initial input of stimuli (discrete keyed values or continuous sensor data) whilstoutput cells enable the display of results or the control of effectors. The inputs andoutputs are usually represented by the vector x of n dimension and the output y of mdimensions (simply put; x1, x2,...,xn and y1, y2,...ym). .The input data often takes the form of a text file in PC based neural nets: page 30
  31. 31. Bryan Mills 1997 Chapter 3 -Fundamental ConceptsDiagram 6 - Screen dump of a text file for use in WinNN The above input file demonstrates the relatively simplified form of data whichmay be used in ANN training and operation. The above example does not featurescaling of the variables as this is not required in this instance, however it doesprovide a representation of the form input files often take. The file represents 4training sets, each with 2 inputs and 1 output (4,2,1). In the first training set (case) x1would be 0, x2 would be 0, and the expected result (y1) would be 0. This would befollowed by the second set which would be 0,1,1 respectively and the third etc. Thisdata represents the commonly used XOR example/problem and gives the result 1 foran even number of inputs and 0 for an odd (Patterson, 1996). The trained networkcould be used to solve simple yes/no problems for example:Account Purchased Arrange ReasonCustomer? Over 200 visit by Units? sales staffn n n Probably not trade customern y y Offer trade accounty n y Try to increase sales to trade customery y n Credit limit probably reachedTable 5 - Input file explanation The above example is highly simplified. It does, however, represent the styleof business control system which uses yes/no responses. It is important to note thatthe reason for the decision would not be given by the ANN. page 31
  32. 32. Weights and bias terms: Once the data is entered into the network its connection from input layer tocalculation node is used to facilitate the addition of weights. Patterson (1996) usesthe following notation:net=x1w1+x2w2+x3w3=∑xiwiwhere x is input variable and w is weight. Equation 1Hopgood (1996) makes the point of subtracting a bias weight to give:net=∑xiwi-θ Equation 2whereas Patterson (1996) prefers the use of a bias fixed value of 1*w 0 on one of theinput links where w0=-θ. The use of either method is considered acceptable. The weights remain independent of the variables (x) so as to facilitate theiradjustment during backpropagation. It is helpful (Patterson, 1996) to view therelationship between the weights, class membership and the bias value in terms of atwo-dimensional plot. In more complicated example the weight value vector (wi)would define a hyperplane in n-space where n is equal to i- the number of variables.In this example n=2 and so it is two-dimensional.Diagram 7 - Class Membership page 32
  33. 33. Bryan Mills 1997 Chapter 3 -Fundamental Concepts The significant points in Diagram 7 are the offset - giving the value of thebias weight (w0)and the slope of the line which is given by - w1/w2. Thus theformation of the line is derived entirely from the weights and future x values will beshown as either belong to the class or not. Patterson (1996) places particularimportance on this boundary line as he identifies it as the key to the net’s autonomythrough its ability to alter weights and so define what is within the set and what isoutside it. The example shown is linearly separable in that its boundary is define by astraight line/plane. This is largely due to the simplicity of the example (2-dimensional) and partly due to the fact that it would be intended for use in a singlelayer network. As an ability to cope with non-linearity is one of the key features ofANNs they are, of course, capable of dealing with more complicated examples.Generalisation: To deal with n-dimensions and non-linearity ANNs generalise. Patterson(1996) discusses generalisation in terms of “describing the whole from some of theparts” and points out that the alternative to an ability to generalise is knowingeverything. It is possible to identify an object by knowing some general rulesinvolving that class of object without knowing every member of that class. Forexample a metal frame with two wheels, a set of handlebars, a saddle and fittingvarious size requirements is probably a bicycle. It is not necessary to memoriseevery manufacturer’s catalogue. ANNs generalise by creating a class which exists in weight space with itsboundary given by the mapping function F (Patterson, 1996). Mapping functions areeither autoassociative or heteroassociative meaning they map the original patternfrom noisy/incomplete data or map input patterns to different output patterns page 33
  34. 34. respectively. Mapping functions are shown mathematically as F:x→y. The non-linear boundary can be shown by the simplified diagram:Diagram 8 - Universe of objects From Diagram 8 it is possible to see that the boundary established by thenetwork includes both the training set data and other instances of the data not givenin the training set but which would be encountered if more sets of data were madeavailable - therefore giving the network an element of flexibility. The boundary musttherefore include all examples of the training set, all examples of data correspondingto the nets function but not known at time of training, and exclude all other data sets.Once this has been achieved the generalisation can be said to have been a completesuccess. It is apparent that the method of training and the selection of data will haveparticular importance on the accuracy of this process.Choice of mapping or activator function: As mentioned in ANN Overview (above) the summed weights are passedthrough a non-linear function before proceeding to the next cell or output layer. Thisis the mapping function referred to above so that F:x→y, and is known as theactivator function (or activation level/summation function - Medsker, Turban and page 34
  35. 35. Bryan Mills 1997 Chapter 3 -Fundamental ConceptsTrippi, 1996). It is suggested by Patterson (1996) that the choice of activator shouldbe a “monotonic nondecreasing function of net”. This simply means that thefunction should hold true for all facts, even if it was originally based on only asample (monotonic) and that the slope of the function should rise from left to right (itshould not cause values to diminish in relation to other lower values; x=2 y=0.88,x=3 y=0.95 for a sigmoid value). Hopgood (1996) makes the point of stating that“The weights and biases can be learnt, and the learning behaviour of a net dependson the chosen algorithm”. It is further stated that the sigmoid function is mostcommonly used and Patterson (1996) concurs with this statement. The sigmoidfunction is given as: 1fnonlinear ( x ) = 1 + e−x Equation 3and would appear graphically thus: Sigmoid Function 1 0.8 0.6 0.4 0.2 0 1 3 5 -5 -3 -1Diagram 9 - Sigmoid FunctionThe above diagram (Diagram 9) shows some of the key features of the sigmoidfunction and thus its reasons for use, these features include: • The ability to make all values positive . • The relatively fine level of discrimination (the slope) • The fact that all results are given as between 0 and 1.Data pre-processing: It can be appreciated that the data under investigation may take various forms.The ANN will require inputs which are of a numeric nature. This does not prevent page 35
  36. 36. non-numeric data being analysed provided it can be converted, with consistency, intonumbers. For example, risk is a common business concept which is regularlytranslated from the vague - safe, moderately safe, risky, very risky - to a range ofprobabilities (say 100%, 75%, 50%, 25% probability of a favourable eventoccurring). Once the data has been gathered, and given a numeric value if required, theefficiency and accuracy of the ANN can be enhanced by information pre-processing.Wasserman (1989) and Patterson (1996) both concentrate on normalisation, which isa common form of pre-processing. Normalisation is a method by which all the databeing processed can be given a common minimum and maximum value. Forexample, readers familiar with statistics may draw a parallel between thenormalisation of the data with techniques used in statistics to determine probabilitiesusing the normal distribution curve(NDC). Here any distribution of data can bemapped (converted) to the NDC which has its probabilities pre-calculated. The most common form of normalisation will see all the data converted tovalues between 0 and 1. This has the advantage of both reducing the difficulty ofmanipulating large numbers (simply put it is easier to manipulate, say, 0.2 than2,000,000) and enhancing the networks ability to adjust weights by reducingunnecessary emphasis (for example in loan calculations interest rates may be given as% or decimals, loan size in millions). Certain activator functions will restrict outputto between 1 and 0, regardless of input (Medsker, Turban and Trippi, 1996). Forexample the sigmoid function mentioned above:Input 1/1+e-x0.5 0.62255 0.993317 0.999999958635 0.9999999935.2 1 page 36
  37. 37. Bryan Mills 1997 Chapter 3 -Fundamental Concepts2256 1-5 0.0067Table 4 - Sigmoid values In can be seen from the above table that data which is beyond a certain rangeapproaches a value of one (exact point at which it appears as one is dependent on thenumber of decimal places used and rounding). As the data is multiplied by weightsand summed before entering the activator function it can be appreciated that noaccuracy is gained by maintaining a mixture of large and small numbers (as it willconvert everything to between 0 and 1 regardless). It can also be recognised that theoutput from the net, having passed through various activator functions, will be ofdecimal or Boolean form (1 or 0). It is important to recognise that as the results forthe sigmoid function return 0 and 1 for a large range of negative and positivenumbers, respectively, it is advisable to restrict the inputs and outputs to valuesbetween 0.1 and 0.9 (Tucker, 1996). This also avoids the use of either 1 which hasthe effect of distorting the part of the sigmoid function (e-1 = 1/e ) or 0 which distortssigmoid (e0 = 1) and has the effect of negating weights (x1w1 = 0 for x = 0). There are numerous methods of data pre-processing and Tucker (1996)contains a accessible description of six common methods. These are given asDistribution truncation and squashing functions, Natural log regression, Ratiosplitting, Positive/negative split, Variable pre-selection, and Data squashing.These can be explain, briefly as:Technique DescriptionDistribution truncation The removal of outliers, the removal of unusually large,and squashing or small, numbers from the data setfunctions page 37
  38. 38. Natural log regression Using logarithms to convert data into small units.Ratio splitting Not inputting the results of a ratio but the numerator (top) and denominator (bottom) values separately.Positive/negative split Separating a variable which has both positive and negative examples into two separate variables (e.g. Profit becomes Profit or Loss as opposed Profit £200 or Profit - £150 (loss))Variable pre-selection Simply manually deciding which variable to include and which to leave out.Data squashing Converting all data so as it is within a pre-defined range (0.1 to 0.9 being given as most appropriate). Adapted from - Tucker (1996)Table 5 - Data pre-processing It is also important to emphasise that the data (variables) used within a ANNshould have some form of theoretical basis for a relationship before they areincluded13. There is little point comparing interest rates, inflation, project life, levelof risk and outside temperature if the problem under consideration was to determinethe ‘cost of capital’ to use in net present value (NPV) calculations .Training: By their very nature, ANNs require training. This training can be viewed in asimilar manner to human training in that the activity is repeated until the systemproduces a satisfactory result (or it is decided to abandon the training run). Hopgood(1993) suggests that the training process is, more correctly, an “error reduction”process. Alternatively Patterson (1996) suggests that it is “adaptive learning in adynamic environment” emphasising the flexibility of the method. There are, however, three separate methods of training - Supervised,reinforced and unsupervised. Wasserman (1989) argues that unsupervised training is13 The appendix of Farrar, Tucker and Bugmann (1997) provides an example of how this could be approached. page 38
  39. 39. Bryan Mills 1997 Chapter 3 -Fundamental Conceptsthe only “biologically plausible” method of training ANNs. The desire to be“biologically plausible” suggests Wasserman is more concerned with the theoreticalstudy and attempted replication of a brain, than practical application of amathematical technique. Plausibility should be weighed against ‘usefulness as a tool’when ANNs are used in applications not related to the study ofpsychology/neurology. The use of unsupervised learning will be discussed under theheading of Topology below (Kohonen being one of its originators). Supervised andreinforced learning methods are broadly similar in philosophy and so an explanationof supervised training will be made first. In supervised training the results obtained from the cells are compared withthe desired results (contained within the original input). The difference is referred toas an error and the weights contained within the network are adjusted(backpropagation). This adjustment continues until the sum of the squares of thedifferences (errors) are minimised (in a way similar to linear regression - line of bestfit calculations).This process can be simplified thus: • Subtract output from target contained within input vector. • Square the difference to remove negative signs14. • Add all the squared differences together. • Compare this answer with the desired level of error. • If error unacceptable adjust weights and begin again.The minimisation is said to be complete when the error has reached an acceptablelevel. Reinforced training follows a broadly similar route to supervised but, as14 Removal of negatives - else error of -100 plus error of 100 would indicate no error. page 39
  40. 40. opposed to calculating the level of error, the ANN is merely informed that it is eitherwrong or right and continues to adjust weights until a correct result is identified.Patterson (1996) suggest that the method is seldom used in practice and attentionshould instead be paid to supervised and unsupervised learning (other authors makelittle or no mention of reinforced training). It should be noted that it is possible to over-train a network. To continue thehuman analogy, this would represent an employee trained, in a vocational way, tosuch a level that they were only able to perform their present role, and none other.This may occur, for example, in a factory where an operative has been doing thesame repetitive job for such a long period of time their ability to transfer any of theskills they have learnt becomes hampered.Topology: The Multilayer Perceptron and Kohonen’s Self Organising Net will be used togive a more detailed explanation of ANN construction.The Multilayer Perceptron - an example of supervised learning/training: The multilayer perceptron is most commonly used for non-linear estimationor classification. It is often referred to as a feedforward network15. This netcomprises of the conventional input and output layers, with a programmer (user)defined number of nodes and layers between (hidden). The number of nodes used inthe input/output layers is data specific. A diagram representing the multilayerperceptron is provided below:15 Generic name used to refer to this network in particular - any network in which the data flows towards the output istechnically feedforward. page 40
  41. 41. Bryan Mills 1997 Chapter 3 -Fundamental ConceptsDiagram 10 - The Multilayer Perceptron It is usual for the number of input nodes to equal the number of variablesunder consideration, likewise the number of output nodes will equal the number ofdesired outputs. The number of nodes are often used to give the network its name.For example the network above may be termed a 3-4-4-2 network as it has 3 inputnodes, two sets of 4 hidden or computational nodes and 2 output nodes and would bedescribed as a four layer network (although Hopgood (1993) indicates that there isargument concerning this issue as some claim the input layer should not be counted). It can be seen from diagram 10 that each of the calculation nodes isconnected to all four of the next set of nodes, but that no nodes are connectedvertically (with reference to the diagrams rotation only). The weights would beadjusted at (on/during) the connections between the nodes, with each node being anon-linear function (activator function). The process could be described as - input,weight adjustment, conversion via activator function, become input for next layer,weight adjustment, conversion via activator function and output to final layer whereerror checking will occur and instigate backpropagation to adjust weights (note- thismethod utilises supervised training). The number of both layers and calculation nodes is dependent on the page 41
  42. 42. programmers decision given a certain problem. It is common to start at a low numberof layers/nodes and then increase this number until the desired level ofaccuracy/speed is achieved.The Kohonen self organising net- an example of unsupervised learning/training: As mentioned previously, the Kohonen self organising net is a form of netwhich learns using an unsupervised method. This type of network is most commonlyused for pattern recognition and is often referred to as The Self Organising FeatureMap (Patterson, 1996). As well as differing from the Multilayer Perceptron (MLP)in training method and application it also differs in that it is a single layerfeedforward network as opposed to multilayer. In addition the network works on aprinciple of “winner takes all” (Wasserman, 1989). This means that as theinformation is processed within the layer, one and only one, node will transmit anoutput. This is why the method is often referred to as a “competitive one “(Patterson,1996 and Wasserman, 1989).Diagram 11 - Kohonen Self Organising Feature Map Unlike the Multilayer Perceptron the Kohonen net learns in an unsupervisedmanner. As the net is attempting to replicate a pattern in various sets of data it trainsby continually processing different data-sets until it is satisfied that each new run page 42
  43. 43. Bryan Mills 1997 Chapter 3 -Fundamental Conceptswill create a replica of previous runs (within tolerances pre-set). For example a childis not always told the same word repeatedly until he or she learns to say it (by beingtold when he or she has said it incorrectly) but rather learns by being exposed tonumerous examples of speech and establishes the ability to replicate these patternsindependently. This learning style often sometimes referred to a competitive filterassociative memory (Medsker, Turban and Trippi, 1996). It can be appreciated that the differing topologies are related to the differentapplications to which the networks are applied. Classification (MLP) and patternrecognition (Kohonen) are two quite different problems. An example ofclassification may be given as “does this data correspond16 to a customer with goodcredit ratings”, whereas pattern recognition is more commonly associated withspeech or hand writing recognition or recognition of trends or patterns withinfinancial information (which may return us to the credit problem by a different route). In summary it can be said that the Multilayer Perceptron works by co-operation and the Kohonen network by competition between nodes.Summary: ANNs offer an alternative to conventional forms of knowledge based systems.Whilst ANNs drew their original inspiration from studies of the human mind furtherdevelopments in this area are limited by technology. The use of ANNs as amathematical tool is, however, both possible and practical at today’s levels oftechnology. The use of self learning and pattern recognition provides a solution toproblems which, using conventional techniques, may have been overcomplicated orsimple not possible. The basic concept of learning by example through the16 Is it a member of the set of customers with .... page 43
  44. 44. adjustment of mathematical weights is a reasonable approximation of the process andallows the internal computations and structure of the network to be treated as a‘black-box’17. Networks topologies and learning/training methods are problem specific andit is should be appreciated that the correct choice of network, training style, trainingdata, pre-processing method and activator function all contribute towards thesuccessful application of the network. Diagram 12 (overleaf) represents, in the form of a flow diagram, the series ofdiscrete steps which make up ANN operations.17 Black-box - The exact details of the internal workings need not be known to facilitate use. page 44
  45. 45. Diagram 12 - The operation of ANNs - flow diagram
  46. 46. Chapter 4 - Investigation into advantages,disadvantages and current application of ANNs.Introduction: In order to allow a more complete consideration of the suitability of ANNs fora given decision making problem, it is necessary to appreciate the advantages anddisadvantages that the use of ANNs provides. The self-learning pattern recognitionability which gives the ANN its distinct characteristics, also creates disadvantages,some of which are problem-specific, whilst others are universal. ANNs are currently used for a wide variety of decision support problems.Perhaps the most commonly cited problem is that of distress modelling (prediction ofbankruptcy), but examples are also found in such diverse areas as production control,new product development and traffic light sequence (road junction) modelling.Advantages and disadvantages: Numerous authors have commented on the advantages and disadvantages ofANNs and the following, cited from Hammerstrom (1993), provides what could beregarded as fundamental points of interest.Advantages: • They can infer subtle, unknown relationships from the data. • They are non-linear so that complex problems can be solved more accurately than by linear techniques. • They are highly parallel, which makes them run faster on computers with parallel processors than alternative methods.Additional benefits are given by Medsker, Turban and Trippi (1996) as:
  47. 47. Bryan Mills 1997 Chapter 4 - Advantages and Disadvantages • The ability to cope with highly correlated input data (also Multicollinearity Tucker, 1996). • A more highly automated input interface is made possible by ANNs ability to process all inputs at once. • Fault tolerance - due to the high number of nodes, inaccuracy caused by bad data can often be localised and not affect the accuracy of the ANN as a whole. • Generalisation - noisy, incomplete or previously unseen data will still result in a reasonable response being made, providing the ANN is suitably trained (Hawley, Johnson and Raina, 1990). • Adaptability - Training can occur during the ANN’s in-service lifetime, allowing the ANN to remain up to date. Hawley, Johnson and Raina (1990) comment on the fact that ANNs, by thegeneral purpose nature of their structure, are faster to install and maintain than custombuilt KBS. Training of ANNs, though time consuming, need not be as technicallydifficult (and therefore as expensive) as writing the program structure of a KBS. ANNs, due to the way in which the relationship of weights is formed, are notprone to ‘crashing’ as a result of incomplete or inaccurate data but are often said todegrade gracefully over time as the weight values alter. This is often cited as anadvantage, but it should be borne in mind that something which has happenedprogressively over time may not be noticed until damage has occurred, and without anadequate control process, may not be appreciated at that point either.Disadvantages:Hammerstrom (1993): • They may fail to produce a satisfactory solution because of insufficient data page . 47
  48. 48. [for training] or because no learnable (sic) function exists. • They may produce results from a complex machine learning procedure that has no straightforward cause and effect origin that can be easily explained. • They can be slow and expensive to train. • ANN’s computational speed, in the finished application, depend linearly on the number of connections and, roughly [approximately], the square of the number of nodes.To this list of disadvantages should be added ANN’s most criticised fault - • ANNs are not capable of demonstrating the logical reasoning behind the result obtained - the black box approach (Farrar, Tucker and Bugmann, 1997, Hopgood, 1993, Medsker, Turban and Trippi, 1996 and Tucker and Farrar, 1996).A problem recognised by Tucker (1996) and Tucker and Farrar (1996) is the relative‘youth’ of ANNs as a decision making technique. Conventional decision makingtechniques have the advantage of many years of testing, both theoretically and inpractice, which have produced models that are both recognised and accepted. ANNs, by the very fact that they are relatively new and underdeveloped, donot have the weight of past experience to promote their results. However, asdevelopment within the field is progressing, refining of the technique and empiricalevidence should produce an improved methodology and increased statistical evidenceof accuracy.Current application of ANNs: Hawley, Johnson and Raina (1990) provide a comprehensive discussion ofANN applications in finance from which the following is adapted:Corporate Finance Applications:
  49. 49. Bryan Mills 1997 Chapter 4 - Advantages and DisadvantagesFinancial Simulation: Whilst the financial management tasks of a company can bedivided into various smaller and more manageable segments, the complexity of thecompany’s internal and external environment is often misrepresented in thesesimplified models. ANNs provide a means of linking all segments together duringanalysis, they can be tailored to an individual company and are capable of beingdynamic and responsive to change (Donaldson, 1996). ANNs can be used for credit customer behaviour modelling, planning baddebt expenses, planning the cyclical expansion and contraction of accounts,evaluating credit terms and limits, cash management, evaluation of capitalinvestments, asset and personal risk management (insurance), exchange riskmanagement, and the prediction of credit cost and availability based on a company’sfinancial data. Hawley et al (1990) claim that this area of ANN application offers,perhaps, the greatest potential for ANN business application.Prediction: In determining a new policy, direction or product, organisations need todetermine the reaction this choice will create in both present and future investors andthe subsequent effect this will have on their investment decisions. Whilstconventional decision making techniques are often more cost efficient at solvingproblems which have well-identified theoretical underpinning, the problem of investorbehaviour and sentiment is of a complexity more suited to ANNs. Investors often base their decisions on a wide range of issues and informationconcerning both the company and the broader economic environment. It is possible totrain ANNs to mimic the behaviour of investors (using actual investors as trainingmodels) and then determine the effect alterations to company policy and financialposition has on their investment decisions. ANNs offer the opportunity to incorporatea wide range of input and output information enabling the decision maker to gaugereaction to change in ways other than alterations in stock price alone (Hawley et al, page . 49
  50. 50. 1990).Evaluation: Accurately valuing a target company’s net worth before attempting aacquisition increases the probability of success, both in terms of acquisition outcomeand with regards to eventual profit. ANNs are trained, in this instance, by exposure totraining sets of target company data, as input, and human expert value estimate asresponse. This use of ANNs seeks to copy the behaviour of individuals, including theincorporation of human “hunches” and intuition, which would make the use ofconventional decision support programming difficult or impossible. ANNs have been used successfully in a wide range of evaluation problems andconfer advantages including: screening large numbers of companies for potentialundervalue or other form of acquisition attractiveness to minimise decision makerstime (which then only need look at “ideal” companies); the ability to copy theinterpretations of a wide range of decision makers; and the ability to automaticallyadjust to the decision maker’s changing analytical procedures and selection criteriaover time (Hawley et al, 1990).Credit Approval: Using a similar training approach to the company evaluationmethod, detailed above, ANNs are capable of reducing time and labour by mimickingthe decisions of financial staff in both credit approval and credit limit decisions. Inaddition ANNs are able to interpret a wider range of financial statements (providingthey are trained on a wide range) more quickly than their human counterparts,negating the need for the information to be restated in a standardised form (Jenson,1992 and Marose, 1990).Financial Institutions Applications:Assessing Lending/Bankruptcy Risk: ANNs can provide expert opinion on loans andlending arrangements to financial institutions in a similar manner to the example ofcredit approval discussed above.
  51. 51. Bryan Mills 1997 Chapter 4 - Advantages and DisadvantagesSecurity/Asset Portfolio Management: Due to the unstructured nature of the portfoliomanager’s decision making process and the diversity of information involved ANNsoffer advantages over conventional decision making techniques (Hawley et al, 1990).Pricing Initial Public Offerings (of ordinary shares): Determining the issue price ofordinary shares is a complicated process but one which it is essential to optimise(Brett, 1991). The information is often diverse and of a non-standard format and sothe application of ANNs confers advantages, not found in conventional decisionmaking techniques, through their ability to generalise and their lack of reliance on anexplicit rule base.Professional Investors Applications:Identification of Arbitrage Opportunities: By replicating an expert decision maker’sreasoning process, a process he or she may not be able to articulate, the ANN is ableto assist in the identification of companies which are about to becoming victims of ahostile take-over (and thus allowing purchasing of the company’s shares to beinitiated). ANNs offer advantages in their ability to screen large numbers of potentialtargets, thus giving the arbitrageur a smaller workload.The Technical Analyst18: ANNs pattern recognising abilities enable the patterns(hitherto un-calculable) within stock markets to be emulated. Through this evaluatingability, more accurate predictions of share price movements can be derived(Davidson, 1996).The Fundamental Analyst: Industry norm patterns, market conditions and financialstatements can be used to train ANNs to assist in share purchasing in a way similar tothe technical analyst model.Summary: The advantage which ANNs confer over and above more traditional decision18 Influences on share price which are not related to company trading position. page . 51
  52. 52. making/support techniques are their ability to discern patterns in large volumes ofdata through a process of self-learning as opposed to explicit instruction. Thisprocess enables ANNs to discover patterns or relationships which may have beenoverlooked or given too great an emphasis in existing decision support mathematics.ANN’s ability to identify patterns also enables network recognition of variations inhandwriting, voice, and image recognition, and provides opportunities for a widerange of security applications. Currently the use of ANNs in business is predominately within bankruptcyprediction and financial risk assessment. However ANNs have been usedsuccessfully in a wide range of operations management, marketing (data mining) andpersonnel applications. The use of ANNs has been shown to offer increased accuracy (Farrar, Tuckerand Bugmann, 1997) and, in many instances, are one of the only methods currentlyavailable (e.g. handwriting and voice recognition). The ability to deal more easilythan conventional methods with non-linearity (Waldrop, 1992) gives the user anadvantage in the highly non-linear markets in which business operates (Cuthbertsonand Gripaios, 1996). ANNs “operate by a logic known only to themselves” (The Economist, 1995).The most difficult obstacle to overcome in the promotion of ANNs as a decisionmaking tool is their lack of interoperability.

×