Brains rely on spiking neural networks for ultra-low-power information processing. Building artificial intelligence with similar efficiency requires learning algorithms to instantiate complex spiking neural networks and brain-inspired neuromorphic hardware to emulate them efficiently. Toward this end, I will briefly introduce surrogate gradients as a general framework for training spiking neural networks and showcase their robustness and self-calibration capabilities on analog neuromorphic hardware. Drawing further inspiration from biology, I will discuss the impact of homeostatic plasticity and network initialization in the excitatory-inhibitory balanced regime on deep spiking neural network training. Finally, I will show how approximations relate surrogate gradients to biologically plausible online learning rules with a minor impact on their effectiveness.
Information processing with artificial spiking neural networks
1. Affiliated with the Novartis Institutes for BioMedical Research
Affiliated Institute of the University of Basel
Information processing with artificial spiking neural networks
Friedemann Zenke
Computational Neuroscience @ FMI
www.zenkelab.org
2. .ch www.zenkelab.org
Some ways to improve efficiency of deep nets for AI
●
Improve models (inductive biases, architecture)
●
Continual learning
●
Unsupervised learning (data efficiency)
●
Reduce data transfer and communications
(avoid von Neumann bottleneck)
●
Use of analog and mixed signal circuitry
(in-memory computing, analog neurons/synapses)
●
Sparsity of activity and connections
(e.g., event-based processing, network pruning)
3. .ch www.zenkelab.org
Today’s focus is on spiking neural networks
Training spiking neural networks
●
Network translation
Usually makes rate-coding assumption and results in high activity levels.
●
Direct training (surrogate gradients)
Very general. No coding assumptions. Truly end-to-end.
4. .ch www.zenkelab.org
Today’s focus is on spiking neural networks
●
Part 1: Brief intro on surrogate gradients
●
Part 2: Self-calibrating analog neuromorphic hardware
●
Part 3: Can we do without backprop through time?
What’s the optimal initialization for spiking nets?
9. .ch www.zenkelab.org
Input: Spatiotemporal spike trains
Example for this talk
“Randman” Synthetic spike times from smooth random manifolds.
Binary discrimination task
Zenke and Vogels (2021)
11. .ch www.zenkelab.org
Output: Linear combination of filtered output
spike trains
Activation
Time (s)
Readout
Max over time idea from Tempotron
Gütig & Sompolinsky (2006); Gütig (2016)
12. .ch www.zenkelab.org
Learning: Training spiking networks end-to-end
●
Spiking neurons & networks are RNNs
●
They have implicit and explicit recurrence
●
Known training procedures for networks
with hidden units
●
Backpropagation-through time (BPTT)
●
Real-time recurrent learning (RTRL)
Forward Euler integration
implicit
explicit
Neuronal dynamics
Activity going through
synaptic connections
Neftci, Mostafa, & Zenke (2019)
13. .ch www.zenkelab.org
Activity snapshots (network has not learned anything)
Synthetic data set: 2000 samples from two smooth random manifolds
Example: A two-fold classification problem
15. .ch www.zenkelab.org
Learning: Training spiking networks end-to-end
●
Spiking neurons & networks are RNNs
●
They have implicit and explicit recurrence
●
Known training procedures for networks
with hidden units
●
Backpropagation-through time (BPTT)
●
Real-time recurrent learning (RTRL)
Problem
Forward Euler integration
Neftci, Mostafa, & Zenke (2019)
16. .ch www.zenkelab.org
Problem: The derivative of a spike train
is zero almost everywhere
??
almost everywhere
Mem. Parameter
Loss
Neftci, Mostafa, & Zenke (2019)
21. .ch www.zenkelab.org
Surrogate gradients: A tool for task-optimizing spiking neural networks
100ms
Dataset: Spiking Heidelberg Digits (SHD) – Cramer, Stradmann, Schemmel, and Zenke (2020)
Available at www.compneuro.net
Input data
Linear decoder at output
~95% classification accuracy
Objective
Classify words
Area
1
Area
2
Area
3
Channel
/
Neurons
Exc
Inh
23. .ch www.zenkelab.org
Can analog neuromorphic substrates self-calibrate through
learning and thereby overcome the problem of device mismatch?
BrainScales Neuromorphic System
Karlheinz Meier
Sebastian
Billaudelle
Benjamin
Cramer
Johannes Schemmel
Uni Heidelberg
24. .ch www.zenkelab.org
Problem: Device mismatch – A stumbling block for
widespread use of analog neuromorphic hardware
Accuracy
100%
Software
simulation
Weight
transfer
→ costly calibration
25. .ch www.zenkelab.org
To study this question we used the BrainScaleS-2
analog neuromorphic hardware system
Cramer, B., Billaudelle, S., Kanya, S., Leibfried, A., Grübl, A., Karasenko, V., Pehle, C., Schreiber, K., Stradmann, Y.,
Weis, J., Schemmel, J., and Zenke, F. (2022). PNAS
Sebastian
Billaudelle
Benjamin
Cramer
27. .ch www.zenkelab.org
In-the-loop surrogate gradient training
Forward-pass on chip and backward pass in software
2) Measure on-chip
analog voltage traces
3) Inject voltage into computational graph
4) Compute surrogate gradients → update weights
1) Forward pass on chip
(in-the-loop training)
Cramer, B., Billaudelle, S., Kanya, S., Leibfried, A., Grübl, A., Karasenko, V., Pehle, C., Schreiber, K., Stradmann, Y.,
Weis, J., Schemmel, J., and Zenke, F. (2022). PNAS
28. .ch www.zenkelab.org
Functional spiking neural networks trained on accelerated
analog neuromorphic hardware
>80k
images/sec
@ 200mW
Sebastian
Billaudelle
Benjamin
Cramer
Cramer, B., Billaudelle, S., Kanya, S., Leibfried, A., Grübl, A., Karasenko, V., Pehle, C., Schreiber, K., Stradmann, Y.,
Weis, J., Schemmel, J., and Zenke, F. (2022). PNAS
34. .ch www.zenkelab.org
Ignoring explicit recurrence in gradient computation yields
plausible three-factor online learning rules*
(* we still have to worry about spatial credit assignment)
Backpropagation-through time
Real-time recurrent learning (Williams and Zipser; 1989)
Approximate RTRL
●
Sparse
Jacobians →
efficient comp.
●
Local rules
Great review:
Marschall, O., Cho, K., and Savin, C. (2019)
Zenke and Neftci (2021)
35. .ch www.zenkelab.org
Ignoring explicit recurrence in gradient computation
yields plausible three-factor online learning rules*
(* we still have to worry about spatial credit assignment)
●
Hebbian part
●
Voltage-based nonlinearity
●
Eligibility trace (“Ca transient”)
●
Spatial credit assignment (feedback alignment, burstprop, etc...)
Zenke and Ganguli (2018). Neural Computation
Zenke and Neftci (2021). Proceedings of the IEEE
Hebbian part
Presynaptic filter – e.g., presence
of neurotransmitter at the PSD
All recent online learning rules use this trick
i.e., SuperSpike, e-Prop, RFLO, OSTL, …
37. .ch www.zenkelab.org
Approximate learning rules still take advantage of
recurrence and do almost as well as full BPTT
Zenke and Neftci (2021)
No recurrent connections
Recurrent (full BPTT)
Recurrent (ignoring explicit recurrence in gradient)
Speech processing problems (require memory)
39. .ch www.zenkelab.org
Fluctuation-driven firing as observed in the brain is an
ideal starting point for learning in spiking neural nets
Julian Rossbroich Julia Gygax
Based on balanced net theory by
van Vreeswijk, Sompolinski, Brunel, ...
Unpublished
Spiking Conv Nets
Firing rate homeostasis
40. .ch www.zenkelab.org
Conclusion and summary
●
Surrogate gradients allow flexible end-to-end training of
spiking neural networks
●
Voltage-dependent plasticity adds self-calibration
capabilities to analog neuromorphic substrates
●
Yes, we can avoid back-propagation through time!
Approximate RTRL→ three-factor learning rules
41. .ch www.zenkelab.org
Thanks!
Manu
●
Nikolaos Papanikolaou
●
Yue Kris Wu
●
Matthias Depoortere
●
Tianlin Liu
Benjamin
Artwork: kyadava.net
Julian
Sebastian
With Karlheinz Meier † &
Johannes Schemmel
Collaborators
Johannes Schemmel
Uni Heidelberg
Julia
Manvi
Axel
Alumni
Hiring post-docs and PhD students!
Peter
Emre Neftci
FZ Julich