Contents PREFACE xiii ACKNOWLEDGMENTS xv,CHAPTER 1 INTRODUCTION 1 1.1 Why Neural Networks, and Why Now? 1 1.2 What Is a Neural Net? 3 1.2.1 Artificial Neural Networks, 3 1.2.2 Biological Neural Networks, 5 1.3 Where Are Neural Nets Being Used? 7 1.3.1 Signal Processing, 7 1.3.2 Control, 8 1.3.3 Pattern Recognition, 8 1.3.4 Medicine, 9 1.3.5 Speech Production, 9 1.3.6 Speech Recognition, 10 1.3.7 Business, 11 1.4 How Are Neural Networks Used? 11 1.4.1 Typical Architectures, 12 1.4.2 Setting the Weights, 15 1.4.3 Common Activation Functions, 17 1.4.4 Summary of Notation, 20 vii
PrefaceThere has been a resurgence of interest in artificial neural networks over the lastfew years, as researchers from diverse backgrounds have produced a firm theo-retical foundation and demonstrated numerous applications of this rich field ofstudy. However, the interdisciplinary nature of neural networks complicates thedevelopment of a comprehensive, but introductory, treatise on the subject. Neuralnetworks are useful tools for solving many types of problems. These problemsmay be characterized as mapping (including pattern association and pattern clas-sification), clustering, and constrained optimization. There are several neural net-works available for each type of problem. In order to use these tools effectivelyit is important to understand the characteristics (strengths and limitations) of each. This book presents a wide variety of standard neural networks, with dia-grams of the architecture, detailed statements of the training algorithm, and sev-eral examples of the application for each net. In keeping with our intent to showneural networks in a fair but objective light, typical results of simple experimentsare included (rather than the best possible). The emphasis is on computational characteristics, rather than psychological interpretations. TO illustrate the simi- larities and differences among the neural networks discussed, similar examples are used wherever it is appropriate. Fundamentals of Neural Networks has been written for students and for researchers in academia, industry, and govemment who are interested in using neural networks. It has been developed both as a textbook for a one semester, or two quarter, Introduction to Neural Networks course at Florida Institute of Technology, and as a resource book for researchers. Our course has been de- veloped jointly by neural networks researchers from applied mathematics, com- xiii
Xiv Prefaceputer science, and computer and electrical engineering. Our students are seniors,or graduate students, in science and engineering; many work in local industry. It is assumed that the reader is familiar with calculus and some vector-matrixnotation and operations. The mathematical treatment has been kept at a minimallevel, consistent with the primary aims of clarity and correctness. Derivations,theorems and proofs are included when they serve to illustrate the importantfeatures of a particular neural network. For example, the mathematical derivationof the backpropagation training algorithm makes clear the correct order of theoperations. The level of mathematical sophistication increases somewhat in thelater chapters, as is appropriate for the networks presented in chapters 5, 6, and7. However, derivations and proofs (when included) are presented at the end ofa section or chapter, SO that they can be skipped without loss of continuity. The order of presentation of the topics was chosen to reflect increasingcomplexity of the networks. The material in each chapter is largely independent,SO that the chapters (after the first chapter) may be used in almost any orderdesired. The McCulloch-Pitts neuron discussed at the end of Chapter 1 providesa simple example of an early neural net. Single layer nets for pattern classificationand pattern association, covered in chapters 2 and 3, are two of the earliest ap-plications of neural networks with adaptive weights. More complex networks,discussed in later chapters, are also used for these types of problems, as well asfor more general mapping problems. Chapter 6, backpropagation, can logicallyfollow chapter 2, although the networks in chapters 3-5 are somewhat simpler in structure. Chapters 4 and 5 treat networks for clustering problems (and mapping networks that are based on these clustering networks). Chapter 7 presents a few of the most widely used of the many other neural networks, including two for constrained optimization problems. Algorithms, rather than computer codes, are provided to encourage the reader to develop a thorough understanding of the mechanisms of training and applying the neural network, rather than fostering the more superficial familiarity that sometimes results from using completely developed software packages. For many applications, the formulation of the problem for solution by a neural network (and choice of an appropriate network) requires the detailed understanding of the networks that cornes from performing both hand calculations and developing com- puter codes for extremely simple examples.
AcknowledgmentsMany people have helped to make this book a reality. 1 can only mention a fewof them here. 1 have benefited either directly or indirectly from short courses on neuralnetworks taught by Harold Szu, Robert Hecht-Nielsen, Steven Rogers, BernardWidrow, and Tony Martinez. My thanks go also to my colleagues for stimulating discussions and en-couragement, especially Harold K. Brown, Barry Grossman, Fred Ham, De-metrios Lainiotis, Moti Schneider, Nazif Tepedelenlioglu, and Mike Thursby. My students have assisted in the development of this book in many ways;several of the examples are based on student work. Joe Vandeville, Alan Lindsay,and Francisco Gomez performed the computations for many of the examples inChapter 2. John Karp provided the results for Example 4.8. Judith Lipofsky didExamples 4.9 and 4.10. Fred Parker obtained the results shown in Examples 4.12and 4.13. Joseph Oslakovic performed the computations for several of the ex-amples in Chapter 5. Laurie Walker assisted in the development of the backpro-pagation program for several of the examples in Chapter 6; Ti-Cheng Shih did thecomputations for Example 6.5; Abdallah Said developed the logarithmic activationfunction used in Examples 6.7 and 6.8. Todd Kovach, Robin Schumann, andHong-wei Du assisted with the Boltzmann machine and Hopfield net examplesin Chapter 7; Ki-suck Yoo provided Example 7.8. Several of the network architecture diagrams are adapted from the originalpublications as referenced in the text. The spanning tree test data (Figures 4.11,4.12, 5.11, and 5.12) are used with permission from Springer-Verlag. The illus-trations of modified Hebbian learning have been adapted from the original pub- xv
xvi Acknowledgmentslications: Figure 7.10 has been adapted from Hertz, Krogh, Palmer, Introductionto the Theory of Neural Computation, @ 1991 by Addison-Wesley PublishingCompany, Inc. Figure 7.11 has been adapted and reprinted from Neural Networks,Vol. 5, Xu, Oja, and Suen, Modified Hebbian Leaming for Curve and SurfaceFitting, pp. 441-457, 1992 with permission from Pergamon Press Ltd, HeadingtonHi11 Hall, Oxford 0X3 OBW, UK. Several of the figures for the neocognitron areadapted from (Fukushima, et al., 1983); they are used with permission of IEEE.The diagrams of the ART2 architecture are used with permission of the OpticalSociety of America, and Carpenter and Grossberg. The diagrams of the simplerecurrent net for learning a context sensitive grammar (Servan-Schreiber, et al., 1989) are used with the permission of the authors. The preparation of the manuscript and software for the examples has beengreatly facilitated by the use of a Macintosh IIci furnished by Apple Computersunder the AppleSeed project. 1 thank Maurice Kurtz for making it available tome. 1 appreciate the constructive and encouraging comments of the manuscriptreviewers: Stanley Ahalt, The Ohio State University; Peter Anderson, Rochester Institute of Technology; and Nirmal Bose, Penn State University. 1 would like to thank the Prentice-Hall editorial staff, and especially Rick DeLorenzo, for their diligent efforts to produce an accurate and attractive product within the inevitable time and budget constraints. But first, last, and always, 1 would like to thank my husband and colleague, Don Fausett for introducing me to neural networks, and for his patience, en- couragement, and advice when asked, during the writing of this book (as well as other times).
1.1 WHY NEURAL NETWORKS AND WHY NOW? As modern computers become ever more powerful, scientists continue to be chal- lenged to use machines effectively for tasks that are relatively simple for humans. Based on examples, together with some feedback from a “teacher,” we learn easily to recognize the letter A or distinguish a cat from a bird. More experience allows us to refine our responses and improve our performance. Although even- tually, we may be able to describe rules by which we can make such decisions, these do not necessarily reflect the actual process we use. Even without a teacher, we can group similar patterns together. Yet another common human activity is trying to achieve a goal that involves maximizing a resource (time with one’s family, for example) while satisfying certain constraints (such as the need to earn a living). Each of these types of problems illustrates tasks for which computer solutions may be sought. Traditional, sequential, logic-based digital computing excels in many areas, but has been less successful for other types of problems. The development of artificial neural networks began approximately 50 years ago, motivated by a desire to try both to understand the brain and to emulate some of its strengths. Early 1