Information processing with artificial spiking neural networks

Affiliated with the Novartis Institutes for BioMedical Research
Affiliated Institute of the University of Basel
Information processing with artificial spiking neural networks
Friedemann Zenke
Computational Neuroscience @ FMI
www.zenkelab.org

.ch www.zenkelab.org
Some ways to improve efficiency of deep nets for AI
●
Improve models (inductive biases, architecture)
●
Continual learning
●
Unsupervised learning (data efficiency)
●
Reduce data transfer and communications
(avoid von Neumann bottleneck)
●
Use of analog and mixed signal circuitry
(in-memory computing, analog neurons/synapses)
●
Sparsity of activity and connections
(e.g., event-based processing, network pruning)

Today’s focus is on spiking neural networks
Training spiking neural networks
●
Network translation
Usually makes rate-coding assumption and results in high activity levels.
●
Direct training (surrogate gradients)
Very general. No coding assumptions. Truly end-to-end.

Today’s focus is on spiking neural networks
●
Part 1: Brief intro on surrogate gradients
●
Part 2: Self-calibrating analog neuromorphic hardware
●
Part 3: Can we do without backprop through time?
What’s the optimal initialization for spiking nets?

Task-optimizing spiking neural networks end-to-end
Sensory inputs
“Sophisticated function”
Intelligent
behavior

1) Input data
Intelligent
behavior

1) Input data 2) Desired output

1) Input data 2) Desired output
3) Find connectivity (learning)

Input: Spatiotemporal spike trains
Example for this talk
“Randman” Synthetic spike times from smooth random manifolds.
Binary discrimination task
Zenke and Vogels (2021)

Output: Linear combination of filtered output
spike trains
Activation
Time (s)
Readout

Output: Linear combination of filtered output
spike trains
Activation
Time (s)
Readout
Max over time idea from Tempotron
Gütig & Sompolinsky (2006); Gütig (2016)

Learning: Training spiking networks end-to-end
●
Spiking neurons & networks are RNNs
●
They have implicit and explicit recurrence
●
Known training procedures for networks
with hidden units
●
Backpropagation-through time (BPTT)
●
Real-time recurrent learning (RTRL)
Forward Euler integration
implicit
explicit
Neuronal dynamics
Activity going through
synaptic connections
Neftci, Mostafa, & Zenke (2019)

Activity snapshots (network has not learned anything)
Synthetic data set: 2000 samples from two smooth random manifolds
Example: A two-fold classification problem

Evolution of loss during gradient descent
Trials

Learning: Training spiking networks end-to-end
●
Spiking neurons & networks are RNNs
●
They have implicit and explicit recurrence
●
Known training procedures for networks
with hidden units
●
Backpropagation-through time (BPTT)
●
Real-time recurrent learning (RTRL)
Problem
Forward Euler integration

Problem: The derivative of a spike train
is zero almost everywhere
??
almost everywhere
Mem. Parameter
Loss

??
Mem. Parameter
Loss
Option 1 (“classic”): Noise injection → gradient in expectation values.
e.g.: Pfister, Toyoizumi, Barber & Gerstner (2006), Gardner, Sporea & Grüning (2015)
Option 2: Make spikes differentiable.
Huh & Sejnowski (2018)
Option 3: “Know hidden layer targets”
Gilra & Gerstner (2017), Nicola & Clopath (2017)
Option 4: Use surrogate gradients. Bohte (2011),
Bellec, Salaj, Subramoney, Legenstein, and Maass (2018)
Shrestha & Orchard (2018), Zenke & Ganguli (2018), ...
In ML: “Straight-through estimators” Bengio et al. (2013)

Evolution of loss during surrogate gradient descent

Activity snapshots (trained network)

Example: Solving latency coded MNIST
Zenke & Vogels (2021)

Surrogate gradients: A tool for task-optimizing spiking neural networks
100ms
Dataset: Spiking Heidelberg Digits (SHD) – Cramer, Stradmann, Schemmel, and Zenke (2020)
Available at www.compneuro.net
Input data
Linear decoder at output
~95% classification accuracy
Objective
Classify words
Area
1
Area
2
Area
3
Channel
/
Neurons
Exc
Inh

Part 2
Neuromorphic engineers build hardware that seeks to
emulate neural networks instead of simulating them.

Can analog neuromorphic substrates self-calibrate through
learning and thereby overcome the problem of device mismatch?
BrainScales Neuromorphic System
Karlheinz Meier
Sebastian
Billaudelle
Benjamin
Cramer
Johannes Schemmel
Uni Heidelberg

Problem: Device mismatch – A stumbling block for
widespread use of analog neuromorphic hardware
Accuracy
100%
Software
simulation
Weight
transfer
→ costly calibration

To study this question we used the BrainScaleS-2
analog neuromorphic hardware system
Cramer, B., Billaudelle, S., Kanya, S., Leibfried, A., Grübl, A., Karasenko, V., Pehle, C., Schreiber, K., Stradmann, Y.,
Weis, J., Schemmel, J., and Zenke, F. (2022). PNAS
Sebastian
Billaudelle
Benjamin
Cramer

E-phys on analog neuromorphic hardware
Sebastian
Billaudelle
Benjamin
Cramer

In-the-loop surrogate gradient training
Forward-pass on chip and backward pass in software
2) Measure on-chip
analog voltage traces
3) Inject voltage into computational graph
4) Compute surrogate gradients → update weights
1) Forward pass on chip
(in-the-loop training)

Functional spiking neural networks trained on accelerated
analog neuromorphic hardware
>80k
images/sec
@ 200mW
Sebastian
Billaudelle
Benjamin
Cramer

Surrogate gradient learning self-calibrates the analog
neuromorphic substrate

Heidelberg spiking digits: Speech recognition
Sebastian
Billaudelle
Benjamin
Cramer

Part 3
Can we avoid back-propagation through time?
How should we initialize spiking nets?

Ignoring explicit recurrence in gradient computation yields
plausible three-factor online learning rules*
(* we still have to worry about spatial credit assignment)
Backpropagation-through time
Real-time recurrent learning (Williams and Zipser; 1989)
Approximate RTRL
●
Sparse
Jacobians →
efficient comp.
●
Local rules
Great review:
Marschall, O., Cho, K., and Savin, C. (2019)
Zenke and Neftci (2021)

Ignoring explicit recurrence in gradient computation
yields plausible three-factor online learning rules*
(* we still have to worry about spatial credit assignment)
●
Hebbian part
●
Voltage-based nonlinearity
●
Eligibility trace (“Ca transient”)
●
Spatial credit assignment (feedback alignment, burstprop, etc...)
Zenke and Ganguli (2018). Neural Computation
Zenke and Neftci (2021). Proceedings of the IEEE
Hebbian part
Presynaptic filter – e.g., presence
of neurotransmitter at the PSD
All recent online learning rules use this trick
i.e., SuperSpike, e-Prop, RFLO, OSTL, …

End-to-end online learning
Zenke and Ganguli (2018). Neural Computation
Spiking network activity during training

Approximate learning rules still take advantage of
recurrence and do almost as well as full BPTT
Zenke and Neftci (2021)
No recurrent connections
Recurrent (full BPTT)
Recurrent (ignoring explicit recurrence in gradient)
Speech processing problems (require memory)

Fluctuation-driven firing as observed in the brain is an
ideal starting point for learning in spiking neural nets
Julian Rossbroich Julia Gygax
Based on balanced net theory by
van Vreeswijk, Sompolinski, Brunel, ...
Unpublished
Spiking Conv Nets
Firing rate homeostasis

Conclusion and summary
●
Surrogate gradients allow flexible end-to-end training of
spiking neural networks
●
Voltage-dependent plasticity adds self-calibration
capabilities to analog neuromorphic substrates
●
Yes, we can avoid back-propagation through time!
Approximate RTRL→ three-factor learning rules

Thanks!
Manu
●
Nikolaos Papanikolaou
●
Yue Kris Wu
●
Matthias Depoortere
●
Tianlin Liu
Benjamin
Artwork: kyadava.net
Julian
Sebastian
With Karlheinz Meier † &
Johannes Schemmel
Collaborators
Johannes Schemmel
Uni Heidelberg
Julia
Manvi
Axel
Alumni
Hiring post-docs and PhD students!
Peter
Emre Neftci
FZ Julich

Information processing with artificial spiking neural networks

Recommended

Recommended

More Related Content

Similar to Information processing with artificial spiking neural networks

Similar to Information processing with artificial spiking neural networks (20)

More from Advanced-Concepts-Team

More from Advanced-Concepts-Team (20)

Recently uploaded

Recently uploaded (20)

Information processing with artificial spiking neural networks