Synaptic processing unit final year project - anthony hsiao


Published on

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Synaptic processing unit final year project - anthony hsiao

  1. 1. Imperial College LondonDepartment of Electrical and Electronic EngineeringFinal Year Project Report 2007Project Title: Synaptic The Synaptic Processing UnitStudent: Anthony HsiaoCourse: 4TProject Supervisor: Dr. George ConstantinidesSecond Marker: Professor Alessandro Astolfi
  2. 2. AbstractA small but growing community of engineers and scientists around the world arebreaking new grounds in the field of Neuromorphic Engineering, and succeed indesigning ever more complex brain-inspired artificial neural systems andimplementing them in low power analogue VLSI silicon chips.A recently proposed synapse model called binary cascade synapse has memoryproperties that are superior to other comparable models, and it is suitable forimplementation into digital hardware. Recent efforts have succeeded in designingFPGA implementations of these binary cascade synapses, but failed to implement ausefully large number of them onto one single chip.This project focuses on developing the FPGA implementation of binary cascadesynapses further, and by radically changing the digital architecture, essentiallydesigning a microprocessor that processes cascade synapses. This processor is calledSynaptic Processing Unit (SPU) and the prototype implementation can currently hostup to 8192 cascade synapses.This report describes the development of the SPU, which necessitated thedevelopment of a novel learning rule alongside of it, called Spike Timing and ActivityDependent Plasticity (STADP), and portrays a characterisation of this learning rule.Both the hardware implementation of the SPU and of the learning rule areimplemented onto an FPGA and evaluated in-circuit.Then, to put the SPU to an ultimate test, it was used together with an aVLSI neuronchip to form a neural system with binary cascade synapses, and was given a realclassification task, whereby it was taught to classify two greyscale images. Andindeed, the system does successfully classify the two images, which is a veryencouraging result.To the best of the knowledge of the author, the SPU presented here is the firsthardware implementation with such large number of synapses of its kind, in theworld.
  3. 3. The Synaptic Processing Unit Anthony HsiaoAcknowledgements Thank you to all those people who have helped me get this far, both academically and otherwise, and to those that accompanied me along the way. In particular, I would like to thank Dylan Muir at the Institute of Neuroinformatics for supervising my project, and being there whenever I needed help, especially during the crazy hours before the FPGA decided to take a holiday in the US. I would also like to thank Dr. George Constantinides at Imperial College London for supervising my project and Prof. Alessandro Astolfi for second marking it. More words of thanks go to Prof. Alessandro Astolfi for coordinating my exchange to ETH Zurich, and for being patient when necessary and laidback whenever possible. Thank you Stefano Fusi, one of the most impressive characters I met at the Institute, for giving me initial feedback and coming up with the basis for what later became STADP. Special thanks to Sungdo Choi and Daniel Fasnacht for all the help and support with the hardware and infrastructure; my computer was not struck by a particle from space, it turned out. Special thanks to Johanna von Lindeiner for good nights on the bench, and the many inspiring exchanges. I actually mean it ! A very special thank you goes out to Pantha Roy, who is just amazing. Thanks for the good times, and for attempting to save me from becoming a social recluse during the final few weeks of this project. An equally special thank you goes out to Siddharta Jha, another amazing character. Thank you for all those discussions and creative breaks, which really enriched my time at the institute. A massive thank you to a fellow brother in work, Christopher Maltby, for enduring all those long days and longer nights of work with me. As you know, without your company, I would not have been able to get any work done, let alone finish. I would like to thank my parents, Wendy and Tien-Wen for their unconditional Tien- support and for opening so many doors for me. Without your efforts and sacrifices, I would not be where I am today, and would probably not get wherever I will get in five, ten years! Finally, I would like to thank Dylan Muir again, because I am actually very grateful for all the help! Without your razor-sharp brain lobes and you patience and support, I would not have been able to achieve half of what I managed to do! 1-2
  7. 7. The Synaptic Processing Unit Anthony Hsiao13 THE APPENDIX III – A JOURNEY THROUGH THE SPU 13-11713.1 PRE-SYNAPTIC SPIKE 13-11713.2 POST-SYNAPTIC SPIKE 13-11914 APPENDIX IV – DESIGN HIERARCHY OF SOURCE FILES 14-120 1-6
  8. 8. The Synaptic Processing Unit Anthony HsiaoList of figuresFIGURE 1: IMAGE OUTPUT OF A SILICON RETINA .................................................................................... 1-11FIGURE 2: NEURONS OF THE WORLD. ................................................................................................... 2-16FIGURE 3: ACTION POTENTIALS (PIKES) ARE COMMONLY DESCRIBED BY THREE PROPERTIES:...................... 2-17FIGURE 4: ACTION POTENTIALS OF THE WORLD. .................................................................................... 2-18FIGURE 5: CGI OF A SYNAPSE WITH PRE- AND POST-SYNAPTIC NEURONS. ................................................ 2-19FIGURE 6: MICROGRAPH OF A SYNAPSE TAKEN AT THE UNIVERSITY OF ST. LUIS. ..................................... 2-19FIGURE 7: DIFFERENT FORMS OF SYNAPTIC PLASTICITY .......................................................................... 2-21FIGURE 8: SCHEMATIC OF A CASCADE MODEL OF SYNAPTIC PLASTICITY. ............................................... 2-22FIGURE 9: INITIAL SIGNAL-TO-NOISE-RATIO AS A FUNCTION OF MEMORY LIFETIME, FROM [1]..................... 2-24FIGURE 10: CIRCUIT DIAGRAM OF AN ULTRA LOW POWER INTEGRATE & FIRE NEURON. ............................ 2-26FIGURE 11: CIRCUIT DIAGRAM OF THE SO CALLED DIFF-PAIR INTEGRATOR (DPI) SYNAPSE........................ 2-27FIGURE 12: PROTOTYPE FPGA BOARD DEVELOPED BY DANIEL FASNACHT. ............................................. 2-29FIGURE 13: EXPERIMENTAL HARDWARE SETUP...................................................................................... 2-30FIGURE 14: STADP ........................................................................................................................... 3-33FIGURE 15: THE STADP MECHANISM. ................................................................................................. 3-34FIGURE 16: SIMULATED BEHAVIOUR OF STADP. .................................................................................. 3-36FIGURE 17: SYSTEM LEVEL INTERACTION OF SPU AND AVLSI NEURON CHIP............................................ 4-39FIGURE 18: BIT REPRESENTATION OF CASCADE SYNAPSES ...................................................................... 4-40FIGURE 19: SPU INTERNAL ADDRESSING FORMAT ................................................................................. 4-42FIGURE 20: CONCEPTUAL ARCHITECTURE OF THE SPU.......................................................................... 4-43FIGURE 21: A HYBRID CELLULAR AUTOMATA LINEAR ARRAY ................................................................ 5-49FIGURE 22: CONVENTIONS ON THE ARROWS USED IN BLOCK DIAGRAMS .................................................. 5-51FIGURE 23: SPIKE FORWARDING MODULE BLOCK DIAGRAM.................................................................... 5-52FIGURE 24: STADP LEARNING RULE BLOCK DIAGRAM........................................................................... 5-54FIGURE 25: INITIALISATION OF DELTA_T LOOK-UP TABLE. ...................................................................... 5-55FIGURE 26: FLOW DIAGRAM OF THE CASCADE SYNAPSES STATE UPDATE RULE ........................................ 5-56FIGURE 27: CASCADE MODULE BLOCK DIAGRAM .................................................................................. 5-58FIGURE 28: CASCADE MEMORY BLOCK DIAGRAM ................................................................................. 5-59FIGURE 29: INPUT SOURCE SELECTOR BLOCK DIAGRAM ......................................................................... 5-60FIGURE 30: PIPELINED SPU BLOCK DIAGRAM ....................................................................................... 5-61FIGURE 31: PIPELINED DATAFLOW THROUGH THE SPU .......................................................................... 5-62FIGURE 32: BLOCK DIAGRAM OF THE INTEGRATION OF THE SPU WITHIN THE FPGA BOARD ...................... 5-63FIGURE 33: COMPARISON OF DELTA_T_LUT CONTENT FOR 5KHZ AND 90MHZ....................................... 7-69FIGURE 34: SIMULATED HARDWARE BEHAVIOUR OF STADP AT 5KHZ SIMULATION CLOCK FREQUENCY. .... 7-71 1-7
  9. 9. The Synaptic Processing Unit Anthony HsiaoFIGURE 35: FREQUENCY RESPONSE OF THE NEURAL SYSTEM. ..................................................................7-74FIGURE 36: OSCILLOSCOPE SCREENSHOT OF POST-SYNAPTIC MEMBRANE POTENTIAL:................................7-74FIGURE 37: EXAMPLE OF A COHERENT 30HZ POISSON SPIKE TRAIN TO ALL 256 SYNAPSES. ........................7-76FIGURE 38: OSCILLOSCOPE SCREENSHOT OF POST-SYNAPTIC MEMBRANE POTENTIAL:................................7-77FIGURE 39: IN-CIRCUIT VERIFICATION OF POTENTIATION. ........................................................................7-78FIGURE 40: IN-CIRCUIT VERIFICATION OF DEPRESSION. ...........................................................................7-79FIGURE 41: OSCILLOSCOPE SCREENSHOT OF DECREASING POST-SYNAPTIC FIRING RATE: ............................7-80FIGURE 42: USING PICTURES AS PRE-SYNAPTIC STIMULI. .........................................................................7-82FIGURE 43: SPIKE TRAINS DERIVED FROM 16X16 PIXEL GREYSCALE IMAGES OF ANTHONY AND DYLAN. .....7-82FIGURE 44: CONCEPTUAL PROCEDURE OF A REAL CLASSIFICATION TASK. .................................................7-85FIGURE 45: CLASSIFICATION TASK: TEACH DYLAN, SHOW DYLAN FIRST, AT 22HZ. ..................................7-87FIGURE 46: CLASSIFICATION TASK: TEACH DYLAN, SHOW ANTHONY FIRST, AT 22HZ. ..............................7-87FIGURE 47: CLASSIFICATION TASK: TEACH DYLAN, SHOW DYLAN FIRST, AT 25HZ. ..................................7-88FIGURE 48: CLASSIFICATION TASK: TEACH DYLAN, SHOW ANTHONY FIRST, AT 25HZ. ..............................7-88FIGURE 49: CLASSIFICATION TASK: TEACH ANTHONY, SHOW ANTHONY FIRST, AT 22HZ...........................7-89FIGURE 50: CLASSIFICATION TASK: TEACH ANTHONY, SHOW DYLAN FIRST, AT 22HZ. ..............................7-89FIGURE 51: CLASSIFICATION TASK: TEACH ANTHONY, SHOW ANTHONY FIRST, AT 25HZ...........................7-90FIGURE 52: CLASSIFICATION TASK: TEACH ANTHONY, SHOW DYLAN FIRST, AT 25HZ. ..............................7-90FIGURE 53: CLASSIFICATION TASK: BOTTOM-UP TEACHING DYLAN, AT 50HZ..........................................7-92FIGURE 54: CLASSIFICATION TASK: BOTTOM-UP TEACHING DYLAN, AT 70HZ. .........................................7-92FIGURE 55: CLASSIFICATION TASK: BOTTOM-UP TEACHING DYLAN, FOR 2S AT 50HZ................................7-93FIGURE 56: CLASSIFICATION TASK: BOTTOM-UP TEACHING ANTHONY, AT 50HZ. .....................................7-93FIGURE 57: CLASSIFICATION TASK: BOTTOM-UP TEACHING ANTHONY, AT 70HZ. .....................................7-94FIGURE 58: CLASSIFICATION TASK: BOTTOM-UP TEACHING ANTHONY, FOR 2S AT 50HZ. ..........................7-94FIGURE 59: EXPECTED EFFECTS ON A SYNAPSE ....................................................................................8-101FIGURE 60: PRE-SYNAPTIC SPIKE ARRIVES AT SPU. ............................................................................13-117FIGURE 61: VALID PRE-SYNAPTIC SPIKE GETS FORWARDED, AFTER TWO CLOCK DELAYS ........................13-117FIGURE 62: VALID PRE-SYNAPTIC SPIKE GENERATES A PLASTICITY EVENT. ............................................13-117FIGURE 63: CASCADE SYNAPSE CHANGES IN OPERATION ....................................................................13-118FIGURE 64: PLASTICITY EVENTS .......................................................................................................13-118FIGURE 65: VALID POST-SYNAPTIC SPIKE ARRIVES AT SPU..................................................................13-119FIGURE 66: POST-SYNAPTIC SPIKE DOES NOT GET FORWARDED ...........................................................13-119FIGURE 67: POST-SYNAPTIC SPIKE SETS POST-SYNAPTIC EXPIRY TIME. ..................................................13-119 1-8
  10. 10. The Synaptic Processing Unit Anthony Hsiao1 Introduction ‘The brain – that’s my second most favourite organ!’ – Woody AllenSolving the mystery behind how the human brain works and computes will be one ofthe most significant discoveries in the history of science. A profound understandingof our most important organ (bar Woody Allen…) will have significant implicationsto healthcare, psychology and ethics, as well as to computing, robotics and artificialintelligence. Visionaries such as Ray Kurzweil go as far as predicting, that before themiddle of the 21st century, humans and machines will be able to merge in a waynever seen before, as brain interfaces enable users to bridge the gap between the realand virtual worlds to a level where the distinction between ‘real’ and ‘not real’ mightlose its importance. Artificial systems would reach computational powers thatmatched those of the human brain, just to surpass them a few years later.Most people find it difficult to imagine such scenarios, especially since even the mostpowerful computers to date, which can perform billions of operations per second,cannot reproduce some of the computational-magic that human brains perform on aday to day basis, such as pattern recognition or visual processing. ‘Intelligent’ and‘interactive’ systems are neither intelligent nor interactive, the most advanced robotsin the world are no match for a young child when it comes to performing motor tasksor recognition; the thought of ever meeting a machine with intelligence, humor or anopinion goes far beyond what most people think their computers will ever be able todo.Such future scenarios have been the topic of several books and films, and areportrayed as horror scenarios more often than not, ignoring many of the potentialopportunities that such a future could bear. Without attempting to make anyqualifying judgments, it should be noted that change happens, whether it is welcomeor not.This change could well be initiated by a small but growing community of engineersand scientists, driven by impressive advances in neuroscience, who are making 1-9
  11. 11. The Synaptic Processing Unit Anthony Hsiaosignificant progress in copying neuronal organization and function into artificialsystems. The secret to the human brain’s superior abilities appears to reside in howthe brain organises its slow acting electrical and chemical components (namelyneurons, as basic computational unit in the brain, synapses, which are the interfacesof neurons and possess rich dynamics allowing neurons to form interconnectedneural circuits). Researchers sometimes speak of ‘morphing’ these structures ofneural connections into silicon circuits, creating neuromorphic microchips. Ifsuccessful, this work could lead to implantable silicon retinas for the blind or soundprocessors for the deaf that last for 30 years on a single nine-volt battery or to low-cost, highly effective visual, audio or olfactory recognition chips for robots and othersmart machines. The long term goal is to engineer ever more complex artificialsystems with ever richer behaviour, and ultimately, the construction of an artificialbrain.1.1 What is neuromorphic engineering?The term neuromorphic was coined by Carver Mead, in the late 1980s to describeVery Large Scale Integration (VLSI) systems containing analogue electronic circuitsthat mimic neuro-biological architectures present in the nervous system.Neuromorphic Engineering is a new interdisciplinary field that takes inspiration frombiology, physics, mathematics and engineering to design analog, digital or mixed-mode analog/digital VLSI artificial neural systems. These include vision systems,head-eye systems, auditory processors and autonomous robots, whose physicalarchitecture and design principles are based on those of biological nervous systems.Although the field of neuromorphic engineering is still relatively new, impressive andencouraging results have already been achieved. Ranging from ‘simple’ chips withsilicon neurons or synapses [13] to more complex systems such as a silicon retina orcochlea [13] have been demonstrated in the past. 1-10
  12. 12. The Synaptic Processing Unit Anthony Hsiao sili Figure 1: Image output of a silicon retina Showing the head of a person at the Brains in Silicon Lab at Stanford University.1.2 The topic of this projectThis project focuses on one aspect of neuromorphic systems which is at the heart ofsome of the dynamics of neural networks, namely on synapses. Fusi et. al. havedemonstrated how using ordinary bounded synapse models can have devastatingeffects on memory in scenarios with ongoing modifications, and proposed a newsynapse model, the binary Cascade Synapse [1], which outperforms ordinary (binary)synapse models on several aspects [9].The nature of the Cascade Synapse makes it convenient to implement in digitalhardware rather than analogue VLSI, and it would be useful to augment existingneuromorphic neuron chips with Cascade Synapse functionality. Such a neuralsystem could then act as one single entity in a larger multi chip environment.Previous efforts have successfully designed individual cascade synapses andimplemented a small number – eight, to be precise – of them on an FPGA; however,in order to perform useful computation in a reasonably sized neural system, a massiveup-scaling of the number of synapses on one chip is necessary. In order to augment atypical aVLSI neuron chip with cascade synapse functionality, any number upwardsof 4000 synapses would be desirable, or rather, necessary.One way of doing this is to fundamentally change the way cascade synapses areimplemented on the FPGA, referred to as virtualisation: rather than having a numberof fixed hardware cascade synapses, which is logic-real-estate inefficient, anabstraction of each synapse could be stored in memory, and only retrieved, processedon and stored on demand. Since memory is generally cheap and abundant, unlike 1-11
  13. 13. The Synaptic Processing Unit Anthony Hsiaologic, in digital circuits, this Synaptic Processing Unit (SPU) can potentially allow fora very large scale implementation of cascade synapses on one single FPGA.1.3 Aims 1. To develop a Synaptic Processing Unit based on an FPGA that implements a large number of cascade synapses 2. To integrate the SPU with an aVLSI neuron chip to form a working neural system 3. To demonstrate the capabilities of the neural system by performing a real classification task1.4 Further report structureThis report is written for the scientifically and technically minded reader, withbackground knowledge of the concepts of electronic engineering, and is furtherstructured as follows: Background 2. Background This chapter attempts to brief the reader on all the necessary interdisciplinary background knowledge required for this project. In particular, it outlines some of the relevant biology and neuroscience, explains the used binary cascade model in more detail and describes the hardware and infrastructure environment the SPU will be working in. 3. STADP – a novel Hebbian learning rule This chapter will argue the case for developing a new learning rule called STADP, and describe how it works. It will also present an initial characterisation of the learning rule derived from simulation. 4. Design This chapter starts by providing a summary of the features of the SPU, to allow the reader to get a first impression. Then, it outlines the high level design and argues for the system architecture used. It finishes by giving a set of specifications for a modular implementation of the design. 1-12
  14. 14. The Synaptic Processing Unit Anthony Hsiao 5. Implementation This chapter starts by going off on a tangent, diving into the realm of random number generators. Then, it describes how the specifications given in the previous chapter were implemented in each module, and how the SPU integrates within the FPGA and its environment. 6. Verification This chapter is a very short one, which only portrays the efforts undertaken in order to verify the design and implementation. It will not reproduce the verification efforts themselves. 7. Evaluation & experimentation This is one of the key chapters and describes all the in-circuit verification and experimentation that has been carried out. Furthermore, it explains the real classification task given to the neural system, and presents the results. 8. Discussion This chapter discusses the evaluation and experimentation results, and tries to make general statements about the operation of the SPU, and conclusions about the success of the classification tasks itself. 9. Conclusion This chapter wraps up the report, and includes the conclusions derived from the work presented here. It objectively assesses advantages and disadvantages of the SPU, and suggests further improvements or changes to the system that might be worthwhile. 10. References This chapter enlists the sources that have been referred to while writing the report as well as sources that have been used throughout the design and implementation of the SPU. Append 11. Appendices There are four appendices, Appendix I with a list of supplementary Matlab files used throughout the project, Appendix II with a copy of the checklist used for verification, Appendix III with screeshots of waveforms showing the journey of a 1-13
  15. 15. The Synaptic Processing Unit Anthony Hsiao pre- and a post-synaptic spike through the SPU and finally Appendix IV, listing the design hierarchy of the VHDL source files used. 1-14
  16. 16. The Synaptic Processing Unit Anthony Hsiao2 Background ‘If the human brain were so simple that we could understand it, we would be so simple that we couldnt’ – Emerson M. Pugh2.1 Of brains, neurons and synapsesWhen IBM’s Deep Blue supercomputer beat then world chess champion GarryKasparov during their rematch in 1997, it did so by means of sheer brute force andcomputational power. The machine evaluated some 200 million potential boardmoves a second, whereas Kasparov considered only three each second, at most10.1.1. But despite Deep Blue’s victory (in fact, Kasparov won the first match againstDeep Blue the year earlier, and IBM refused to agree to a third ‘deciding’ match [21]),computers are no real competition for the human brain in areas such as vision,hearing, pattern recognition, and learning, not to mention their inability to displaycreativity, humour or emotions. And when it comes to operational efficiency, there isno contest at all. A typical room-size supercomputer weighs roughly 1,000 timesmore, occupies 10,000 times more space and consumes a millionfold more powerthan does the neural tissue that makes up the brain [22].Clearly, computers and brains are fundamentally different, both in terms ofarchitecture and performance. Table 1 summarises important key differences ofbrains and (conventional) computers. Processing Element Energy Speed Style of Fault elements size use computation tolerantBrain ~1011 neurons 10-6m 30W 100Hz Parallel, Yes ~1014 synapses distributed, memory at computation PC 109 transistors 10-6m 30W 109Hz + Serial, No (CPU) centralized, memory distant to computation Table 1: A comparison between computers and brains 2-15
  17. 17. The Synaptic Processing Unit Anthony HsiaoAt the most basic cellular level, brains consist of a vast number of brain cells, anestimated 100 billion of them, called neurons. These are also believed to constitutethe basic building blocks of computation within the central nervous system, and arein many ways analogous to logic gates in digital electronics. The brains network ofneurons forms a massively parallel information processing system.While there are a large number of different types of neurons, each with differentfunctions and morphologies, most neurons are typically composed of a soma, or cellbody, a dendritic tree and an axon, as shown in Figure 2. Figure 2: Neurons of the world.There are many different types of neurons, each with different morphologies and functions, which are found in different parts of brains. Image courtesy of G. IndiveriOne of the most important properties of a neuron is its membrane potential, thepotential difference across the cell membrane, which is used to communicatebetween neurons. A complicated molecular mechanism that stems from the cell’shighly complex membrane can give rise to so called action potentials or spikes, whichare sharp a increase followed by an equally sharp drop in the membrane potentialwithin a few ms. A neuron receives inputs, i.e. spikes, from other neurons, typicallymany thousands, on its dendritic tree, and integrates them (approximately) on itsmembrane potential. Once the membrane potential exceeds a certain threshold, theneuron generates a spike which travels from the body down the axon, commonly 2-16
  18. 18. The Synaptic Processing Unit Anthony Hsiaodescribed as the output of a neuron, to the next neuron(s) (or other receptors). Thisspiking event is also called depolarization, and is followed by a refractory period,during which the neuron is unable to fire. The membrane potential of a spikingneuron is shown in Figure 3, conceptually, while Figure 4 shows some measurementsof real action potentials of the world. Typically, neurons fire at rates between 0Hzand about 100Hz, and both the precise timing of individual spikes and the firing ratesof neurons are believed to play an important role in neural communication andcomputation. Figure 3: Action potentials (pikes) are commonly described by three properties: (pike pikes) properties roperties: Pulse width, firing rate or inter-spike-interval and refractory period. Courtesy of Giacomo Indiveri. 2-17
  19. 19. The Synaptic Processing Unit Anthony Hsiao Figure 4: Action potentials of the world. Courtesy of Giacomo Indiveri, modified by Anthony HsiaoThe axon endings of neurons almost touch the dendrites or cell body of the nextneuron. The gap between two neurons is a specialized structure called synapse and isthe point of transmission of spikes from the pre-synaptic neuron to the post-synapticneuron, as shown in Figure 5 and Figure 6. This transmission is effected byneurotransmitters, chemicals which are released from the pre-synaptic neuron upondepolarization, which bind to receptors in the post-synaptic neuron, therebyadvancing the depolarisation of it. Most synapses are excitatory, i.e. they increase thedepolarisation of the post-synaptic neuron, although there are so called inhibitorysynapses (with inhibitory neurotransmitters), which render a post-synaptic neuron lessexcitable. The human brain is estimated to have a vast 1014 synapses.The extent to which a spike from one neuron is transmitted on to the next, thesynaptic efficacy or weight, depends on many factors, such as the amount ofneurotransmitter available or the number and arrangement of receptors, and is notconstant, but changes over time. This property is called synaptic plasticity, and it isthis variable synaptic strength, that is believed to give rise to both memory andlearning capabilities, which makes it particularly interesting to study synapses! 2-18
  20. 20. The Synaptic Processing Unit Anthony Hsiao pre- post- Figure 5: CGI of a Synapse with pre- and post-synaptic neurons. Excerpt of the 2005 Winner of the Science and Engineering Visualisation Challenge. By G. Johnson. Medical Media, Boulder, CO Figure 6: Micrograph of a Synapse taken at the University of St. Luis. In the center of the image is the Synaptic Cleft, which separates the pre- (top) and post-synapticneuron (bottom). The pre-synaptic neuron has clearly visible vesicles which contain neurotransmitters.Upon pre-synaptic depolarisation, these neurotransmitters are released and diffuse across the synaptic cleft, to be received by receptors on the post-synaptic neuron, advancing its depolarisation.Scientists have developed various models of the underlying molecular mechanisms ofsynaptic plasticity, describing it to good levels of accuracy; however it is important toappreciate, that there are details to synaptic plasticity which are still subject ofongoing research. 2-19
  21. 21. The Synaptic Processing Unit Anthony Hsiao2.2 Synaptic plasticity at the heart of learning in neural systemsThere are several underlying mechanisms that cooperate to achieve synaptic plasticity,including changes in the quantity of neurotransmitter released into a synapse andchanges in how effectively cells respond to those neurotransmitters [7]. As memoriesare believed to be represented by vastly interconnected networks of synapses in thebrain, synaptic plasticity is one of the important neuro-chemical foundations oflearning and memory. Thereby, strengthening, Long-Term Potentiation (LTP), andweakening of a synapse, Long-Term Depression (LTD), are widely considered to bethe major mechanisms by which learning happens and memories are stored in thebrain.Many models of learning assume some kind of activity based plasticity, whereby anincrease in synaptic efficacy arises from the pre-synaptic cells repeated and persistentstimulation of the post-synaptic cell. These kinds of learning rules are commonlyreferred to as Hebbian learning rules, popularly summarised as ‘What fires together,wires together’.Another particularly prominent experimentally observed form of long term plasticityis called Spike-Timing Dependent Plasticity (STDP), and depends on the relativetiming of pre- and post-synaptic action potentials. If a pre-synaptic spike is succeededquickly by a post-synaptic spike, then there appears to exist some kind of causalitysince the pre-synaptic neuron has contributed to the depolarization of the post-synaptic neuron, and they should be connected more strongly, by potentiating thesynapse. Conversely, if a pre-synaptic spike is directly preceded by a post-synapticspike, their connection should be weakened, and the synapse gets depressed.Different forms of observed plasticity that can be described by STDP are shown inFigure 7. 2-20
  22. 22. The Synaptic Processing Unit Anthony Hsiao Figure 7: Different forms of synaptic plasticity The amount (qualitatively) and type of synaptic modification evoked by repeated pairing of pre- and post-synaptic action potentials in different preparations. The horizontal axis is the difference tpre-tpost of these spike-times. Results are shown for slicerecordings of different neurons. Without going into unnecessary detail, the important point to note is that different forms of plasticity exist. Figure from Abbott & Nelson 2000.Several other models of synaptic plasticity exist, ranging over several levels ofcomplexity and biological plausibility. Each has its advantages and disadvantages,proposing different mechanisms of synaptic plasticity, trying to explain differenttypes of experimentally observed plasticity. Other global regulatory processes oflearning, such as synaptic scaling or synaptic redistribution are thought to benecessary alongside activity based learning rules [5].While learning rules and models of synaptic plasticity attempt to describe themechanism by which synaptic plasticity is generated, different models of synapsesthemselves exist, which can vary greatly in the way they respond to ‘plasticity signals’.2.3 The cascade synapse modelStoring memories of ongoing, everyday experiences requires a high degree ofsynaptic plasticity, while retaining these memories demands protection againstchanges induced by further activity and experiences. Models in which memories arestored through switch-like transitions in synaptic efficacy are good at storing but badat retaining memories if these transitions are likely, and they are poor at storage butgood at retention if they are unlikely [1]. In order to address this dilemma, Fusi et. al.developed the model of binary cascade synapses, which combines high levels ofmemory storage with long retention times and significantly outperforms conventionalmodels [9]. 2-21
  23. 23. The Synaptic Processing Unit Anthony HsiaoThey consider the case of binary synapses, i.e. a synapse with only two efficacies, (forexample potentiated and depressed, weak or strong), which is not implausible, sincebiological synapses have been reported to display binary states of efficacy as well [2].The structure of a binary cascade model is shown in Figure 8, specifying twoindependent dimensions for each synapse. Just like ordinary models of binarysynapses, a binary cascade synapse can be in one of two states of efficacy, weak orstrong, but while ordinary models only allow one fixed value of plasticity, cascadesynapses possess a cascade of n states with varying degree of plasticity,implementing metaplasticity (i.e. the plasticity of plasticity). Ongoing plasticity thencorresponds to transitions of a synapse between states characterized by differentdegrees of plasticity, rather than (only) different synaptic strengths. Figure 8: Schematic of a Cascade Model of Synaptic Plasticity. Courtesy of Stefano Fusi. There are two levels of synaptic strength, weak (yellow) and strong (blue), denoted by + and -. Associated with these strengths is a cascade of n sates (n = 5 in this case). Transitions between state I of the cascade of any strength and state 1 of the opposite strength take place with probability qi, corresponding to conventional synaptic plasticity. Transitions with probabilities p i ±link the states within the respective cascade (downward arrows), corresponding to metaplasticity.Binary cascade synapses can respond to any learning rule with binary plasticitysignals, i.e. signals that are either ‘potentiate’ or ‘depress’, and responds to themstochastically; plasticity signals are only responded to with a given probability which 2-22
  24. 24. The Synaptic Processing Unit Anthony Hsiaois determined by the state along the cascade the synapse is in. So it is the varyingprobability of responding to plasticity signals that implement the different degrees ofplasticity described above.In the highest state (state 1 of the cascade in Figure 8), the probability of respondingto a plasticity event is 1, and decreases for states further down the cascades, wherethe synapse becomes less plastic. In the model analysed by Fusi, the plasticity actuallyhalves for every state down the cascade, i.e. 50% chance of responding to a plasticitysignal in the second cascade, 25% in the third, and so forth.A cascade synapse can respond to plasticity events in two ways, depending onwhether it already has the ‘right’ efficacy, referred to as switching and chaining. If itswitches, then it is changing efficacy, i.e. from weak to strong, or vice versa. If asynapse switches, it will always make a transition to state 1, i.e. the most plastic state,of the opposite cascade, regardless of what state it was in before. In Figure 8, thesetransitions are represented by the arrows between the two cascades, with plasticityprobabilities given by qi. If the synapse chains, i.e. it already has the right efficacy,then it is moving down one state in the cascade, thereby reducing (halving) itsplasticity probability, becoming less plastic. In Figure 8, this is represented by thedownward arrows connecting consecutive states within each cascade, with plasticityprobabilities given by pi+/-.Thus, cascade synapses can respond to ongoing modifications by reducing theirplasticity, thereby ‘reassuring’ their state of efficacy. Another way of looking at it isthat synaptic efficacies and their degree of plasticity are dependent on the history ofthe synapses and the plasticity signals they received.Fusi et. al. assess the performance of cascade synapses to that of ordinary binarysynapses by comparing the strength of an initial memory trace, the initial signal-to-noise ratio, as well as the average memory lifetime, the point at which this signal-to-noise ratio becomes equal to 1 for both synapse model (it is worthwhile to reiterate,that it was this trade-off, ability to store memories easily vs. retaining them for a longtime, that originally led them to develop the cascade synapse model in the first place).They find that cascade models arrive at a better compromise, storing new memories 2-23
  25. 25. The Synaptic Processing Unit Anthony Hsiaomore easily and faithfully, yet retaining them for a longer period of time, as shown inFigure 9. Without going into unnecessary detail (the interested reader is advised toconsult [1] for more information), they find that the better performance of cascadesynapses stems the fact that they experience power-law forgetting, unlike ordinarybinary synapses, which experience exponentially fast decay of their memories. Figure 9: Initial Signal-to-noise-ratio as a function of memory lifetime, from [1]. Signal-to-noise- [1]. 5 The initial signal-to-noise ratio of a memory trace stored using 10 synapses plotted against the memory lifetime (in units of 1 over the rate of candidate plasticity events). The blue (lower) curve isfor a binary model with synaptic modification occurring with probability q that varies along the curve. The red (upper) line applies to the cascade model described by Fusi et. Al. The two curves have been normalised so that the binary model with q = 1 gives the same result as the n = 1 cascade model towhich it is identical. Clearly, the cascade model performs better than the ‘normal’ binary model both in terms of initial signal-to-noise ratio and memory lifetime.In summary, binary cascade synapses outperform their ‘ordinary counterpart’ in termsof memory storage and retention, which derives from the more complex structureallowing the synapse to respond to ongoing modifications along two dimensions –efficacy and metaplasticity. It is desirable to implement these nice properties into realhardware, and previous attempts have already laid good groundwork for that.2.4 Previous workThis project mainly builds up on two previous projects. The first one, titled ‘Astochastic synapse for reconfigurable hardware’, a short project during the Tellurideworkshop for Neuromorphic Engineering by Dylan Muir [15], laid the ground work 2-24
  26. 26. The Synaptic Processing Unit Anthony Hsiaofor both the following and this project. In particular, it succeeded in creating a firstVHDL implementation of the cascade synapse and verified its operation insimulations. One of the biggest contributions of this project is the design of oneparticular type of pseudo-random number generator, the Hybrid Cellular Automataarray pseudo-random number generator, which also found extensive use in thiscurrent project. However, no actual hardware was synthesised from the digital design.The second project, ‘A VHDL implementation of the Cascade Synapse Model’, adiploma project by Tobias Kringe [16], succeeded in designing and implementing asmall array of cascade synapses onto an FPGA. The operation of the digital cascadesynapses was verified both in simulation and in hardware, and encouraging resultswere achieved in confirming the complex behaviour of the cascade synapse (which iswhy this current project will not focus on reproducing and re-verifying the propertiesof hardware implemented cascade synapses). However, the VHDL implementationwas rather large, and only a small number of synapses could be implemented ontothe FPGA. It was Tobias Kringe who proposed to virtualise the cascade synapses(which is one of the aims of this current project) in order to realise a useful number ofsynapses onto one FPGA. Due to the radically different architecture of the virtualisedsynapses to the static hardware synapses, next to none of his VHDL implementationwas reused.To the best of the knowledge of the author, there has been no other workinghardware implementation of a large number of cascade synapses (in fact, of anynumber of synapses) to date.2.5 Overview of the hardware environmentNeuromorphic aVLSI hardware commonly comprises low power analogue CMOScircuits operating in the subthreshold regime, that mimic (morph) the properties ofreal neural systems and elements. In particular, a neuromorphic aVLSI neuron chipwas used, which comprised an array of leaky Integrate & Fire (IF) silicon neuronswith Diff-Pair Integrator (DPI) synapses. Communication to the outside world wasdone using the asynchronous Address Event Representation (AER) protocol. The 2-25
  27. 27. The Synaptic Processing Unit Anthony HsiaoFPGA is sitting on an FPGA board developed at the Institute of Neuroinformatics inZurich.2.5.1 Silicon neuronsThere are different types of silicon neurons, such as conductance based models whichaim to map molecular conductance mechanisms underlying neuron behaviour indetail into analogue electronic circuits, or more qualitative models such as the I&Fneuron model, which merely implements the observed characteristics of neuronbehaviour into silicon, such as integration, firing or the refractory period.The aVLSI chip used in this project contained 128 I&F neurons similar to the circuitdepicted in Figure 10. Qualitatively, this I&F circuit works by integrating inputcurrent from on-chip synapses on its membrane, and elicits a (voltage) spike if themembrane voltage crosses a firing threshold. Figure 10: Circuit diagram of an ultra low power Integrate & Fire Neuron. 10: an low Labelled functional circuit elements mimic the behaviour of real neurons. Transistors operate in thesub-threshold regime to exploit their desirable exponential characteristics. A capacitor Cmem integratesincoming post-synaptic current into a membrane voltage Vmem. If the membrane potential crosses the spiking threshold, it will ‘spike’ just like a real neuron. Courtesy of Giacomo Indiveri. 2-26
  28. 28. The Synaptic Processing Unit Anthony Hsiao2.5.2 Silicon synapsesEach I&F neuron has 32 silicon synapses with different properties and behaviourconnected to it, but only one type of synapse was used in this project, namely thestatic DPI synapse. The circuit of such a synapse is depicted in Figure 11.Qualitatively, the DPI synapse works by receiving a (voltage) spike from a pre-synaptic neuron (or from the outside world), and then injects a given amount ofcurrent onto the membrane of the post-synaptic neuron it is connected to in response.The amount of current produced by every incoming spike is dependent on the staticsynaptic weight and the time constant of the synapse, which can be adjusted toachieve the desired static synaptic weight. Figure 11: Circuit diagram of the so called Diff-Pair Integrator (DPI) synapse. 11: Diff- iff synapse. For every pre-synaptic spike it receives, it dumps a post-synaptic current onto the membrane of the post-synaptic neuron connected to it. The amount of current, and other dynamics, can be set by parameters such as the synaptic weight, the time constant tau or the threshold voltage. Communication2.5.3 Communication using AERThe Address Event Representation (AER) protocol is used to allow forcommunication in multi-chip environments. It is a serial asynchronous four-phasehandshaking protocol (using request-acknowledge signals) which encodes events (i.e.spikes) of individual neurons by assigning each neuron a unique address (up to 2-27
  29. 29. The Synaptic Processing Unit Anthony Hsiao16bits). Every time a neuron fires, it generates an address event, which is thentransmitted over the AER bus to receiving hardware. Unlike conventional electronicsystems with arrays of information sources, such as digital cameras, neuromorphicsystems using the AER protocol do not scan through every one of its elements totransmit one frame after another, but rather, information is transmitted on demand.Only if a neuron spikes, will an address event be transmitted. Therein, one of themost important points about the AER protocol is its asynchrony, whereby the precisetiming of the address event is implicitly encoding the time of the spike itself – noneed to communicate timestamps for individual spikes.Conveniently, since electronic circuits implementing neuromorphic hardware are veryfast, while neural activity is rather slow (<100Hz), a large number of neurons canshare the same AER bus without problem. Typically, an AER bus would have abandwidth of about 1Mevent/second.2.5.4 The FPGA boardThe FPGA used in this project is a Xilinx Spartan 3 (xc3s400pq208) that sits on aprototype FPGA board developed by Daniel Fasnacht during his diploma project atthe Institute of Neuroinformatics in Zurich, depicted in Figure 12. Features used inthis project are the USB interface and the two AER ports (one input, one output). Ithas an external clock of 106.125MHz, and is programmed using JTAG.Apart from developing the board itself, Daniel Fasnacht further developed a Linuxdriver to allow communication with the USB board. A program developed byGiacomo Indiveri is used to send data to the FPGA board. In particular, pre-synapticspikes are sent through the USB bus to the SPU by specifying a synapse address andan inter-spike interval to the previous spike, data which is easily generated using thepiking neuron toolbox1 in Matlab. The aVLSI neuron chip is configured using Matlab2.1 Developed by Dylan Muir at the Institute of Neuroinformatics2 To set up the environment variable for the aVLSI chip in Matlab: chipinit.m. To load the requiredcalibration settings to the chip: bias_050607.m 2-28
  30. 30. The Synaptic Processing Unit Anthony HsiaoIt should be noted, that his is a prototype board, and with experimental or prototypehardware, extra consideration should be taken, since not all functions necessarilyhave to work as expected. However, seeing experimental hardware work and become‘alive’ is one of the most gratifying moments of hardware development.In the experimental setup used for the classification task (as described in 7.5A realclassification task) the FPGA board interfaces with an aVLSI ‘IFSLTWA’ neuron chip,using the AER connections to send address events to, and receiving feedback fromthe neurons. Figure 13 illustrates this experimental setup. Figure 12: Prototype FPGA board developed by Daniel Fasnacht. 12: 1. Xilinx Spartan 3 (xc3s400pq208) 2. USB port 3. AER-out port 4. AER-in port 2-29
  31. 31. The Synaptic Processing Unit Anthony Hsiao 13: Figure 13: Experimental hardware setup. 1. FPGA SPU 2. Forward AER connection 3. aVLSI chip with array of I&F neurons 4. Oscilloscopemeasuring the post-synaptic membrane potential 5. post-synaptic feedback AER connection (with logic analyzer) 6. pre-synaptic stimuli input USB connection.2.5.5 SoftwareThroughout this project, three software packages were used, namely Xilinx ISE 9.1iWebpack to code the VHDL design, Modelsim PE Student Edition to simulate VHDLcode and Matlab, for various things, including plotting, initialization file generation,analysis or spike train generation.A project diary was kept on GoogleDocuments. 2-30
  32. 32. The Synaptic Processing Unit Anthony Hsiao rule3 STADP – a novel Hebbian learning r ule ‘The illiterate of the 21st century will not be those who cannot read and write, but those who cannot learn, unlearn, and relearn’ – Alvin TofflerIn the previous section, the general concept of synaptic plasticity was introduced.While different learning rules have been proposed, for the task at hand, keeping inmind that the Synaptic Processing Unit is to be tested on a real classification task, it isnecessary to implement a learning rule that is both suitable for the learning task in ageneral environment, as well as easily implemented into digital hardware. There areseveral learning rules out there that would be interesting to be implemented, mostprominently STDP, amongst also others [18], [3], [20], but none really meet the needsfor this project.From [19] and [20], it was concluded that ordinary STDP would not be sufficient as ageneral learning rule. Instead, the system would either have to be taught withspecifically crafted and highly correlated temporal patterns (not a generalenvironment), or a more elaborate version of STDP would have to be constructed,which is impractical for the implementation, both in terms of hardware real estate(memory in particular, but also logic) and circuit complexity. Prototype designs forSTDP were rejected on the basis of it requiring excessive memory andovercomplicating the digital circuit.Instead, a novel but very simple, easily implemented learning rule was developedtogether with [20], called Spike-Timing and Activity Dependent Plasticity (STADP),which produces simple binary plasticity events, depress and potentiate, as required bythe binary cascade synapse model.3.1 STADP – Yet another learning rule?At the heart of STADP is the same Hebbian learning paradigm, that ‘what firestogether, wires together’. Unlike STDP, which derives the causality for ‘firingtogether’ from the difference in spike times, STADP uses a mixture of firing time and 3-31
  33. 33. The Synaptic Processing Unit Anthony Hsiaofiring rate based measures to determine, whether pre- and post-synaptic neuron ‘firetogether’.As the name suggests, STADP produces plasticity signals depending on spike timingas well as activity. In particular, it is dependent on the state of activity of the post-synaptic neuron, and the timing of pre-synaptic spikes.STADP says, that the post-synaptic neuron can be in one of two states at any point intime: active and inactive. This state is determined by a threshold function of the post-synaptic firing frequency: if it is above a mean firing rate fm, it is said to be active,otherwise it is inactive. For example, a setup of aVLSI I&F neurons could have amean firing rate fm = 50Hz, which is biologically plausible, and be said to be activefor firing rates above 50Hz, and inactive for firing rates below 50Hz.Then, two neurons are said to ‘fire together’ if a pre-synaptic spike arrives while thepost-synaptic neuron is active, and the synapse should be potentiated (LTP). Thereverse is also true, i.e. when a pre-synaptic spike arrives at the synapse while thepost-synaptic neuron is inactive, then the synapse should be depressed (LTD).However, this scheme would result in one plasticity signal for every pre-synapticspike, so in order to condition the number of plasticity signals produced, STADP isstochastic, and only produces potentiation or depression signals with a certainprobability, called the probability of plasticity, p(plasticity). Figure 14 belowsummarises how STADP produces plasticity events. 3-32
  34. 34. The Synaptic Processing Unit Anthony Hsiao 14: Figure 14: STADPPlasticity events are elicited with a probability p(plasticity), and depend on the spike time of the pres- synaptic, and the activity of the post-synaptic neuron.3.1.1 From spike time to spike rateThe two state abstraction of the post-synaptic neuron’s activity essentially requires anintegration of its spike-times to produce spike rates. However, integration of spikesarriving at irregular intervals into spike rates can be a non-trivial task in real timeprocessing in digital hardware (it would be very easy in analogue electronicsactually!). In STADP, this is elegantly performed using a stochastic process, inspiredby quantum physics [20]. The main idea behind this is that the post-synaptic neuronis in an unknown state of activity until it gets ‘measured’, in this case by an incomingpre-synaptic spike.Every time the post-synaptic neuron spikes, its state of activity is set to activeindependent on the current state. A neuron in active state can then make a transitionto the inactive state with a probability p(deactivate) (this can also be regarded as atwo state hidden Markov process), as depicted in Figure 15.Without specifying what the p(deactivate) is at any point of time, it can beappreciated how a post-synaptic neuron firing at mean firing rate fm should have aprobability of being in active state, p(active) of 0.5, a more active neuron should havea higher p(active) and a less active neuron should have a lower p(active). 3-33
  35. 35. The Synaptic Processing Unit Anthony Hsiao 15: mechanism. Figure 15: The STADP mechanism. A post-synaptic neuron can be in one of two states: active and inactive. The STADP mechanism determines the state of the post-synaptic neuron by integrating the post-synaptic firing times. A post-synaptic spike sets the neuron to active state, which then stochastically resets to the inactive state afteran amount of time equal to the mean postsynaptic inter-spike interval. Clearly, the probability that thepost-synaptic neuron is in active state at any given time increases as it’s firing rate increases, and is 0.5 if it is firing at the mean firing rate.In order to implement this in real hardware (it would be rather challenging to actuallyinstantiate some kind of quantum process), the STADP mechanism proposed here isusing an abstraction of the stochastic deactivation of the post-synaptic neuron. Thisabstraction is based on the assumption that the neuron fires as a poisson process withmean firing rate fm, which has an exponentially distributed inter-spike interval (thetime interval between two consecutive spikes) ~ exp(1/fm). Then, upon everyincoming post-synaptic spike (which sets the neuron’s state to active), anexponentially distributed ‘expiry time’ is drawn, after which the neuron is said toreset to the inactive state.This way, the desired properties can be achieved: if the post-synaptic neuron is firingat the mean firing rate fm, it will have an equal chance of being in active or inactivestate, on average, at any point in time. Similarly, if it is firing at a higher rate, it has ahigher chance of being active since it is being set to active faster than it is expiring toinactive, while if it is firing at a lower rate, it has a lower chance of being active atany point in time. 3-34
  36. 36. The Synaptic Processing Unit Anthony HsiaoOne question remains. Whether a plasticity event is a depression or a potentiationevent is dependent on the post-synaptic neuron’s activity as explained above – butthen, how does STADP behave for different pre-synaptic frequencies? As the namesuggests, the plasticity is dependent on spike timing, since the state of activity of thepost-synaptic neuron is only ever evaluated on an incoming pre-synaptic spike, but infact, its rate plays a role too.In general, the higher the pre-synaptic frequency, the more plasticity events will beproduced. However, since potentiation and depression are only elicited withprobability p(plasticity), the dependence on the pre-synaptic rate is slightly morecomplex. While high pre-synaptic frequencies are likely to lead to a high rate ofplasticity, low, but non-zero, pre-synaptic frequencies are likely not to result in anyplasticity event at all, as only few of the already rare pre-synaptic spikes would everlead to a plasticity event.In summary, the pre-synaptic firing rate can be said to determine the rate (probability)of plasticity events, while the post-synaptic frequency is best described as setting thetype of the plasticity events. Synapses with high pre-synaptic firing rates are morelikely to be receiving plasticity signals, while synapses with low pre-synaptic firingrates are likely to remain static, as they receive none or only few plasticity events. Characteristics3.2 Characteristics of STADPThe previous section explained how, conceptually, STADP works, and how the actualSTADP mechanism, which draws an exponentially distributed expiry time for thepost-synaptic neuron to reset to the inactive state, works. The following paragraphsdescribe some of its characteristics as well as the expected plasticity signals thatSTADP would produce.When characterising the behaviour or the results of STADP, the two important pointsto be noted are firstly whether the expiry time mechanism works at all, and secondlywhat plasticity profile it produces over a range of pre- and post-synaptic frequencies.By observing p(active), the correct operation of the mechanism can be verified, by 3-35
  37. 37. The Synaptic Processing Unit Anthony Hsiaoobserving the plasticity rates, i.e. how many potentiation or depression events areelicited per second, insights into the plasticity profile can be gained.The following plots were obtained from a simple Matlab simulation3 done by DylanMuir, and show the rate of potentiation (LTP rate), rate of depression (LTD rate), thenet effect of plasticity (LTP rate – LTD rate) as well as p(active), over pre- and post-synaptic frequency ranges of 0-100Hz. 16: Figure 16: Simulated behaviour of STADP. Left column: rate of potentiation and depression events per second, over a range of pre- and post- synaptic frequencies [1:100Hz] (ignore the axis labels). Right column: Net effect of STADP and probability of the postsynaptic neuron being in active state per unit time.These simulation results suggest that STADP indeed works as a Hebbian learning rule,and has the desired characteristics. The p(active) is approximately 0.5 at a post-synaptic frequency of 50Hz, is increases for higher frequencies, and decreases forlower frequencies. Furthermore, the plasticity rate increases with pre-synaptic3 p(active) curve: make_prob_active_vs_freq_plot.m other plots: make_freq_sim_plot.m 3-36
  38. 38. The Synaptic Processing Unit Anthony Hsiaofrequency for both potentiation and depression, which also have a qualitativelycorrect behaviour, best summarized by the net effect of LTP and LTD: with increasingpre-synaptic frequencies, there are more plasticity events, with potentiationdominating for high post-synaptic frequencies, and depression dominating for lowpost-synaptic frequencies.One important characteristic to note, however, is that potentiation and depression arenot symmetric within the regime of operation, and that the net effect of plasticity hasa bias towards depression, or equivalently, reluctance towards potentiation. This isdue to the p(active) curve, which is not linear or symmetric about the (50Hz, 0.5)point. As will be described later in the experimental section, this will have anobservable effect.Possible remedies for this could include measures such as pre-biasing or distorting thep(active) curve so that it saturates at 100Hz, or by setting a minimum expiry time of10ms (1/100Hz) in order to ensure that p(active) is 1 at 100Hz. The remedy usedwould have to be matched to the particular implementation of STADP.While more detailed and formal analysis of STADP would be desirable, this would gobeyond the scope of this report. These initial simulation results are satisficing ( =satisfying enough), and confidence in the learning rule further derives from [20]. 3-37
  39. 39. The Synaptic Processing Unit Anthony Hsiao4 Design ‘I am enough of an artist to draw freely upon my imagination. Imagination is moreimportant than knowledge. Knowledge is limited. Imagination encircles the world’ – Albert Einstein4.1 Summary of features of the Synaptic Processing UnitThe Synaptic Processing Unit designed here has the following features: • Speed of operation: Clocked at 90MHz internally • System architecture: o Fully pipelined design – the SPU can theoretically process a new address event every clock cycle, although this never happens in practice o Modular design – allows for easy plug-in of a new learning rule • On-chip learning rule: STADP with 11.1ns time resolution • I/O ports: 1x USB input, 1x AER input, 1x AER output • Cascade representation: 6bit, reconfigurable, allowing for synapses with up to 32 cascades • Cascade memory address width: 13bit, reconfigurable, allowing for up to 8192 binary cascade synapses • Addressing: Configurable number of neurons (up to 256) • One teacher synapse per neuron4.2 System level designAlthough this project builds upon previous work as mentioned earlier, most parts ofthe Synaptic Processing Unit were designed from scratch, since the pipelined andvirtualized cascade synapse requires a very different architecture. 4-38
  40. 40. The Synaptic Processing Unit Anthony Hsiao4.2.1 The SPU in a neural systemFrom a high level point of view, the SPU is supposed to integrate with one aVLSIneuron chip, forming one coherent neural system containing an array of neurons withcascade synapse functionality. This system could, for example, be used as one layerof a larger network of spiking neurons, as depicted in Figure 17. Figure 17: System level interaction of SPU and aVLSI neuron chip. 17: System interaction Together, these form one freely reconfigurable integrated array of N Integrate and Fire neurons with binary cascade synapses.4.2.2 Input and output portsIn order to act as one coherent system, the SPU has to be able to communicate bothwith the neuron chip, as well as with the outside world. Here, this is done using theUSB port of the FPGA board as pre-synaptic input, and the two AER ports to connectthe SPU to the neuron chip.Clearly, a forward connection, whereby pre-synaptic spikes are routed towards theright post-synaptic neuron is necessary. However, in order to be able to performlearning using STADP, and indeed most other learning rules, an additional feedbackconnection from the neuron chip back to the SPU is necessary, in order to obtaininformation about the post-synaptic neurons, which in this case means to estimatetheir state of activity. 4-39
  41. 41. The Synaptic Processing Unit Anthony Hsiao4.3 Virtualising the cascade synapseThe binary cascade model is quite a nice model to be implemented in digitalhardware. It has essentially only two important properties, namely its binary efficacyand its current state, which at the same time encodes the plasticity, which in turn isrepresented by a plasticity probability, which halves for every higher cascade. Thishas ‘digital’ written all over it.In order to virtualise the cascade synapses, some conceptual ‘cascade mechanism’ bywhich to process them has to be devised. The basic idea is to trade hardware realestate on the FPGA for memory, and to process synapses on demand. This has twoimmediate design deliverables: • In order to virtualise the cascade synapses, an abstraction or memory representation of them has to be defined, • A mechanism, by which they are processed on, i.e. how individual synapses respond to plasticity signals, has to be developedConveniently, the cascade synapse can be represented by a bit vector very intuitively.One bit encodes the synaptic efficacy, while a number of other bits encode the stateof the synapse, i.e. the synaptic plasticity, i.e. the plasticity probability, depending onthe number of cascades. Then, halving the plasticity probability is just a matter of abit shifting operation. As depicted in Figure 18, an Nbit representation where theMSB represents the efficacy, and the word [N-1...0] represents the plasticityprobability, as an unsigned binary number. 18: Figure 18: Bit representation of cascade synapses 4-40
  42. 42. The Synaptic Processing Unit Anthony HsiaoUsing this representation, the plasticity probability ranges from 0 to 2N-1-1 rather thanfrom 0 to 1, but this is not a problem, since it can be regarded as the numerator of arational number with denominator 2N-1-1. Such a representation can easily be storedin and retrieved from memory, and provides the functionality required to implementthe virtualisation.Here, N = 6 was fixed as a reasonable maximum cascade representation width,allowing for synapses with up to 32 cascades. This is more than sufficient, and in fact,too large a number of cascades can actually decrease the memory performance of thesynapses [1].The processing on the cascade synapse can be expected to be relatively simple, sincethere is only a small number of things the synapse ‘can do’: switch or chain, with aprobability given by its state. The exact mechanism implemented is described indetail in the 4-41
  43. 43. The Synaptic Processing Unit Anthony HsiaoImplementation section, but from a high level description point of view, it has to: • Obtain the right cascade from memory • Perform the necessary operations on its state representation (i.e. switch, chain or do nothing) • Produce a new cascade state representation, and pass it back to the cascade memory4.4 SPU internal addressingSince incoming and outgoing events are following the AER protocol, wherebyneurons are identified by addresses, the SPU internal representation is also usingaddresses as identifiers of synapses. 19: Figure 19: SPU internal addressing formatAt the heart of the addressing scheme are the synapses, which can be identifieduniquely by an Nbit synapse address, as shown in Figure 19. For historical reasons4,this synapse address is set to 13bits, allowing it to uniquely identify up to 8192synapses. The top few bits of the synapse address represent the neuron address,which uniquely identify the post-synaptic neuron which the cascade synapse isconnecting to. The aspect ratio of the neural system, i.e. how many neurons there areand how many synapses each has can be changed freely within the SPU by changing4 The SPU was originally designed to interact with an aVLSI chip with 256 neurons and 8192 synapses,the largest of its kind at that time 4-42
  44. 44. The Synaptic Processing Unit Anthony Hsiaothis neuron address width, and does not have to correspond to the actual number ofneurons (or synapses) on the aVLSI chip. Modular4.5 Modular design of the SPUApart from implementing cascade synapse behaviour in a virtualised fashion, the SPUhas to perform two other important tasks: spike forwarding and learning.Overall, the core of the SPU, i.e. ignoring data I/O and FPGA board particulars, willhave the following four modules: • Forwarding module • Learning module • Cascade module • Cascade memoryThe conceptual architecture that stems from these four modules is depicted in Figure20. Figure 20: Conceptual Architecture of the SPU 20:The principle of operation of the SPU is as follows: 4-43
  45. 45. The Synaptic Processing Unit Anthony Hsiao 1. The signal selector (not one of the core functions of the SPU) performs arbitration between pre- and post-synaptic inputs, and forwards this address into the SPU, to the forwarding module, the cascade memory and the learning module. 2. The cascade memory retrieves the cascade synapse representation corresponding to the synapse address, and, at the same time, writes new cascade states to (another location in) memory. 3. The learning rule (stochastically) produces plasticity signals as required by STADP and the pre- and post-synaptic spikes the SPU receives. 4. The forwarding module forwards pre-synaptic addresses on to the output of the SPU, depending if, and only if, the efficacy of the synapse is high. 5. The cascade module (stochastically) processes the cascade representation according to the plasticity signals it receives from the learning module and passes on a new cascade state to be written by the cascade memoryThis architecture can be fully pipelined, so that the SPU can process one ‘instruction’,i.e. one address event, per clock cycle. This is particularly important in order to ensurethat the SPU is operating fast enough, since in a multi-chip environment, it should notbe the processing bottleneck, but rather, it should be able to process whatever isbeing thrown its way by the pre-synaptic input (USB). Since the AER bus cantypically transmit about 1Mevent/second, the SPU should be able to process amultiple of that, which a fully pipelined architecture allows.In order to ensure that only the ‘right’ signals are being processed and that no wrongdata is written to memory, the SPU uses an extra level of control signals that indicatethe validity of the data shown in Figure 20.4.6 Module specificationsThe high level relationship between the individual modules described abovetranslates into precise input/output and functional specifications, described below. 4-44
  46. 46. The Synaptic Processing Unit Anthony Hsiao4.6.1 ForwardingFunction: • To forward valid pre-synaptic spikes to the post-synaptic neuron address over the AER output of the SPU, if the ‘target’ synapse has high efficacy or a teacher signal was sent.Input signals: • neuron_address: address of the synapse the current pre-synaptic spike is addressed to. Up to 13bits • target_synapse_efficacy: MSB of the cascade representation of the addressed synapse. 1bit. • address_pre_post: control signal issued by the signal selector which indicates whether current data comes from the pre-synaptic (‘0’) or the post- synaptic (‘1’) feedback input. 1bit. • address_valid: control signal that indicates whether current data is a validOutputs: • target_neuron_address: address of the post-synaptic neuron that is to be sent out through the AER output. up to 8bits. • target_address_valid: control signal that indicates whether the target neuron address is valid. 1bit.4.6.2 Learning Rule (STADP)Function: • To implement STADP • To correctly produce plasticity events (dep./pot.)Inputs: • synapse_address: address of the incoming pre- or post-synaptic spike. Up to 13bits. • address_pre_post: control signal issued by the signal selector which indicates whether current data comes from the pre-synaptic (‘0’) or the post- synaptic (‘1’) feedback input. 1bit. • address_valid: control signal that indicates whether current data is a valid. 1bit.Outputs: • cascade_synapse_address: address of the cascade synapse that the plasticity signals are valid for. Up to 13bits. • plasticity_dep_pot: plasticity signal, indicating whether the cascade synapse should be depressed (‘0’) or potentiated (‘1’). 1bit. • plasticity_valid: control signal that indicates whether the plasticity signal and the cascade synapse address are valid. 1bit. 4-45
  47. 47. The Synaptic Processing Unit Anthony Hsiao Cascade4.6.3 Cascade ProcessFunction: • To process cascade states according to plasticity signals from the learning moduleInputs: • cascade_synapse_state: cascade state representation of the cascade synapse that is to be processed. Up to 6bits. • cascade_synapse_address: address of the current cascade synapse that the plasticity signals are valid for. Up to 13bits. • plasticity_dep_pot: plasticity signal, indicating whether the cascade synapse should be depressed (‘0’) or potentiated (‘1’). 1bit. • plasticity_valid: control signal that indicates whether the plasticity signal and the cascade synapse address are valid. 1bit.Outputs: • cascade_address_out: address of the new cascade state representation of the valid new state. Up to 6bits. • new_state: new processed cascade state representation ready to be written back to memory. Up to 6bits. • new_state_valid: control signal that indicates whether the new state and the cascade out address is valid. 1bit.4.6.4 Cascade memoryFunction: • To retrieve cascade representations of synapses addressed at its read port • To store valid and new cascade representations of synapses addressed at its write portInput signals: • synapse_address: address of the cascade the current pre-synaptic spike is addressed to. Up to 13bits. • new_state_address: address of the new state that has undergone plasticity. Up to 13bits. • new_state: new state of cascade synapse after processing. Up to 6bits. • new_state_valid: control signal that indicates whether the new state for the new state address is a valid. 1bit.Outputs: • current_state: address of the post-synaptic neuron that is to be sent out through the AER output. Up to 6bits. 4-46
  48. 48. The Synaptic Processing Unit Anthony Hsiao4.6.5 Global signalsIn addition to the inputs specified above, all modules share clock, clock enable andasynchronous reset inputs to reset all internal registers and FIFOs. Note that thecontent of memory is not reset to the initial state by this reset signal, but only theoutput registers of the memory are cleared. All signals internal to the SPU are activehigh. 4-47
  49. 49. The Synaptic Processing Unit Anthony Hsiao5 Implementation ‘Its not good enough that we do our best; sometimes we have to do whats required’ – Winston Churchill Pseudo-5.1 Pseudo-random number generatorsThe performance of stochastic learning processes, indeed of any stochastic process, isheavily dependent on the ‘quality’ of the underlying randomness. Since the SPU hasrandom processes in two of its major functional components, the cascade synapsemodule and the learning rule, implementing a good pseudo-random numbergenerator (pRNG) is even more important.A good pRNG generates highly uncorrelated sequences of pRNs with a very longmaximum-length, before the sequence repeats. A good review on ‘classical’ pRNGscan be found in [8], however the pRNG used here is more unconventional. Instead ofperforming mathematical manipulation, including multiplication by prime numbersand modulo division to generate pRNs, which is what most classical pRNGs do and israther resource intensive for a digital logic implementation, a so called Hybrid cellularautomata (HCA) array pRNG is employed, which, on the contrary, are a very efficientchoice for FPGA implementation.Cellular automata consist of grids of ‘cells’, where each cell can be in one of a finitenumber of states. Time is discrete, and each cell has a local update rule to determinethe state of it in the next unit of time. One of the most popular cellular automata isConway’s 2D ‘Game of Life’.Here, we consider a one dimensional binary HCA, i.e. an array of bits, where eachcell (bit) has one of two local update rules, namely Rule 90 or Rule 150, as shown inFigure 21, classified by Wolfram [16]. Rule 90 takes the XOR of both of itsneighbours to determine the next state of a cell, while Rule 150 adds the XOR of thecurrent value of the cell as well. Cells beyond the boundaries of the array areconsidered to be 1 at all times, which ensures that the automaton does not freeze incase of all cells being 0. These choices and the right configuration for the rules used 5-48
  50. 50. The Synaptic Processing Unit Anthony Hsiaoensure that the pRNG produces maximum length sequences of uniform pRNs. In [8],there is a detailed description of which rules to use for what bit position to generatemaximum length sequences for HCA arrays of a given size. Figure 21: A Hybrid Cellular Automata linear array 21:The HCA pRNG makes use of two different nearest neighbour update rules, namely Rule 90 and Rule 150. It is very suitable for implementation on an FPGA, and further produces maximal-length sequences of highly uncorrelated patterns. Figure courtesy of Dylan Muir.If used as described above, HCA pRNGs would introduce high correlation foradjacent cells, which can be avoided by only using a subset of non-neighbouring bitsfrom a larger array to generate random numbers. One possible choice for creating a32bit random number is to use a 128bit HCA, tapping off every fourth bit to form thepRN, for example.By using this method to generate pRNs as required by the different modules, thestochastic processes in the SPU can be trusted to be as random as is possible, to thebest of the knowledge of the author.5.2 Description of generics genericsBefore explaining the architecture of the individual SPU internal modules, it is helpfulto understand the parameterisation of the VHDL code that was carried out in order to 5-49
  51. 51. The Synaptic Processing Unit Anthony Hsiaokeep the SPU reconfigurable. The following is a brief description of the generics usedwithin the implementation that allow a customisation of the SPU. • SYNAPSE_ADDRESS_WIDTH : natural := 13: The synapse address width is the width of most the addresses within the SPU, and sets the maximum number of synapses that can be addressed. By default, it is set to 13bits, allowing for up to 8192 cascade synapses to be addressed. The fixed depth of the cascade memory (the memory itself is not parameterisable) also limits the maximum number of synapses to be implemented to 8192, although fewer synapses may be used (manual reconfiguration of the memory would be required to increase the depth of the cascade memory; this is not difficult). • NEURON_ADDRESS_WIDTH : natural := 8: The neuron address width is the width of the neuron address, and tells the SPU how many of the synapse addresses’ MSBs are attributed to identifying the neuron. By default, it is set to 8bits allowing for up to 256 neurons to be addressed, and a smaller number of neurons can be specified without problems. • CASCADE_WIDTH : natural := 5: The cascade width is the number of bits that the cascade representation uses. It can be up to 6 bits wide, as limited by the width of the cascade memory, but fewer bits, such as the default value of 5 bits may be specified. The cascade width includes both the efficacy bit and the plasticity probability width. At the same time, the cascade width specifies the width of the pRN generated in the cascade synapse module, which is always one bit less than the cascade width (since the plasticity probability in the cascade representation, which will be compared to the pRN, is one bit smaller than the cascade width). • PRE_THRESHOLD : natural := 230: The pre threshold sets the p(plasticity) with which STADP elicits plasticity events; the higher the threshold, the smaller is the p(plasticity). It may range from 0 to 255, where p(plasticity) would be 1 and 0 respectively.Using these four parameters, the SPU can be configured, at compile time, to have thedesired characteristics. 5-50
  52. 52. The Synaptic Processing Unit Anthony Hsiao5.3 Module level designThe following sections will individually describe the implementations of the SPU’smodules on a functional level. In order to save paper and time, no VHDL code isreproduced here. The interested reader is advised to consult the supplementary CDfor the VHDL code.In all of the diagrams shown in the following sections, the convention shown inFigure 22 for arrows is used. In particular, dotted arrows are used to represent theflow of control signals, dashed arrows for addresses and solid lined arrows are usedto represent the flow of data. 22: Figure 22: Conventions on the arrows used in block diagramsFurthermore, light blue vertical bars are used to indicate register levels or clockedprocesses.5.3.1 Spike forwardingThe forwarding module is the simplest out of all the four major functional modules.As specified in the previous chapter, it ‘only’ has to forward valid pre-synaptic spikesif the synapse it was addressed to has high efficacy, or if it is being sent to theteacher synapse. The basic structure of the learning module is shown in Figure 23.The outputs are generated in a very simple way. The target neuron address is simplyforwarded directly from the incoming neuron address, while the target address validsignal is a simple chain of logic operations. Note that the target address valid signal isdependent on the negation of the address_pre_post signal, since a pre-synapticinput spike is represented by a ‘0’. 5-51
  53. 53. The Synaptic Processing Unit Anthony Hsiao Figure 23: Spike forwarding module block diagram 23:The teacher synapse is defined to be the 0th synapse of every neuron, i.e. if thesynapse address’ bottom (depending on how wide the neuron address width is) bitsare zero, then it is sent to the teacher synapse, and should be forwarded regardless ofthe synaptic efficacy.Due to its simplicity, the forwarding module only requires one clock cycle to performthe processing.5.3.2 Learning rule (STADP)The learning rule module is much more complex, as shown in Figure 24. It containssome logic, several registers, a look-up table implemented by a 256x36bit single portROM, a 256x36bit single port memory block RAM, a 36bit timer with 11.1nsresolution and an 8bit pRNG. In order to understand it, it is best to work from theoutputs backwards, and considering separately what happens on a pre- and on a post-synaptic synapse address (spike).There are three output signals: the cascade synapse address, the plasticity signal andthe plasticity valid signal, which need to be considered first. 5-52