Weight Agnostic Neural Networks

Weight Agnostic Neural Networks
2019.08.13
Yongsu Baek
yongsubaek@mli.kaist.ac.kr

Table of Contents
• Overview
• Motivation
• Related Work
• Architecture search
• Bayesian Neural Networks
• Algorithmic Information Theory(AIT)
• Network Pruning
• Neuroscience
• WANN
• Overview
• Topology Search
• Performance and Complexity
• Results
• Continuous control tasks
• Image Classification
• Discussion

Overview
• Not gradient-based <-> Gradient based
• Only architecture <-> Weight parameter training
• Evolution <-> Rearrangement, Pruning

Motivation - Biology
• “In biology, precocial species are those whose young already possess certain
abilities from the moment of birth.”

Motivation – Deep learning
• Network Structure - "Strong inductive biases"
• Convolutional networks [2], [3]
• LSTM [4]
5

Goal
• Weight Agnostic Neural Network
• Architectures with "strong inductive biases"
• can already perform various tasks with random weights.
• Weight 학습 없이도 충분히 task를 수행할 수 있는 Network structure를 찾아보
자!
• By deemphasizing the importance of weights
1) Single shared weight
2) Evaluation on a wide range of single weight parameter
• Novel neural network building blocks
6

Related Work
• Architecture search
• Bayesian Neural Networks
• Algorithmic Information Theory(AIT)
• Network Pruning
• Neuroscience
7

Architecture Search
• Evolutionary computing
• Topology Search Algorithm - NEAT [5]
• NAS
• Basic building blocks with strong domain priors – CNNs, recurrent cells, self attention
• Weight Training inner loop -> Slow
• Architectures, once trained, outperform human-designed one
• WANN
• Creating network architectures which encode solutions
• No training inner loop
• The solution is innate to the structure
8

Bayesian Neural Networks
• Weight parameters sampled from learned distribution
• Variance Network [6]
• Sampled from Zero-mean, parameterized variance distribution
• conventional BNNs naturally converge to zero-mean posteriors
• Ensemble evaluation
• WANN
• sampling weights from a fixed uniform distribution with zero mean
• evaluating performance on network ensembles
9
Variance Networks: When Expectation Does Not Meet Your Expectations., K. Neklyudov

Algorithmic Information Theory(AIT)
• Kolmogorov complexity
• The minimum length of the program that can compute it
• Occam’s razor
• Simplifying neural networks by soft weight-sharing [7]
• reducing the amount of information in weights by making them noisy, and simplifying
the search space
• WANN
• finding minimal architectures
• Weight-sharing to the entire network (AIT)
• The weight as a rv sampled from a fixed distribution (BNN)
10
Simplifying neural networks by soft weight-sharing, S.J. Nowlan, G.E. Hinton., 1992

Network Pruning
• starts with a full, trained network, and takes away connections
• Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask (2019)
• pruned networks w/ randomly initialized weights
• WANN
• complementary to pruning
• does not require prior training
• no upper bound on the network’s complexity
11
Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask , H. Zhou, J. Lan, R. Liu, J. Yosinski.

Neuroscience
• connectome
• “wiring diagram” of all neural connections
• forming new synaptic connections and rewire
• analyzed using graph theory
• WANN
• aims to learn network graphs that can encode skills and knowledge
• ever-growing networks
• small enough to be analyzed
12

WANN
• Weight of WANN
• Searching Method
• Topology Search
• Performance and Complexity
13

Weight of WANN
• Architecture themselves encode solutions
• Importance of weights must be minimized
• Weight sampling
• The curse of dimensionality
• Weight-sharing
• Efficient and handful
• Single shared weight sampled from a fixed distribution
14

Searching Method
1. An initial population of minimal neural network topologies is created.
2. Each network is evaluated over multiple rollouts, with a different shared weight
value assigned at each rollout.
15

Searching Method
3. Networks are ranked according to their performance and complexity.
4. A new population is created by varying the highest ranked network topologies,
chosen probabilistically through tournament selection
16

Topology Search
• NEAT [5]
• one of three ways:
1) Insert Node
2) Add Connection
3) Change Activation
• Feed-forward network
17
Evolving neural networks through augmenting topologies, K.O. Stanley, R. Miikkulainen.

Performance and Complexity
• evaluated using several shared weight values
• fixed series of weight values [-2, -1, -0.5, +0.5, +1, +2]
• mean performance
• Prefer simpler network (AIT)
• multi-objective optimization problem:
• mean performance over all weight values
• max performance of the single best weight value
• the number of connections in the network
18

Experimental Results
• Continuous Control
• CartPoleSwingUp
• BipedalWalker-v2
• CarRacing-v0
• Image Classification
• MNIST
19

Experiment
1. Random weights: individual weights drawn from 𝑈(−2,2)
2. Random shared weight: a single shared weight drawn from 𝑈(−2,2)
3. Tuned shared weight: the highest performing shared weight value in range (-2,2)
4. Tuned weights: individual weights tuned using population-based REINFORCE
20

Continuous Control
21
• CartPoleSwingUp
• Cannot be solved with a linear controller

Continuous Control
• BipedalWalker-v2
• non-trivial number of possible connections
• 210 connections (SOTA: 2804 connections)
22

Continuous Control
• CarRacing-v0
• pre-trained VAE to compress the pixel representation
• No pretrained hidden states of RNN
23

Continuous Control Results
• WANNs are not completely independent of the weight values
• Single shared weight
• easy tuning
24

Classification
• Ensemble evaluation - vote
25

Discussion and Future Work
• Method to search for simple neural network
• Fine-tune
• Few-shot learning
• Continual lifelong learning
• Multitask
• Supermask [8]
• similar range of performance
• architecture search in a differentiable manner
26

Discussion
• Contribution?
• Network Structure만의 영향력
• Simple Neural Network의 performance
• WANN 자체의 실효성은 없어보임
• Single shared weight에 의한 structure bias가 있음
• Structure를 찾아내는 optimize 방법 연구
27

References
1) Weight Agnostic Neural Networks. GAIER, Adam; HA, David. arXiv preprint arXiv:1906.04358, 2019.
2) A powerful generative model using random weights for the deep image representation [link]
He, K., Wang, Y. and Hopcroft, J., 2016. Advances in Neural Information Processing Systems, pp. 631—639.
3) Deep image prior [link]
Ulyanov, D., Vedaldi, A. and Lempitsky, V., 2018. Proceedings of the IEEE Conference on Computer Vision a
nd Pattern Recognition, pp. 9446—9454.
4) Training recurrent networks by evolino [HTML]
J. Schmidhuber, D. Wierstra, M. Gagliolo, F. Gomez.
Neural computation, Vol 19(3), pp. 757—779. MIT Press. 2007.
5) Evolving neural networks through augmenting topologies [HTML]
K.O. Stanley, R. Miikkulainen.
Evolutionary computation, Vol 10(2), pp. 99—127. MIT Press. 2002.
6) Variance Networks: When Expectation Does Not Meet Your Expectations [link]
K. Neklyudov, D. Molchanov, A. Ashukha, D. Vetrov.
International Conference on Learning Representations (ICLR). 2019.
7) Simplifying neural networks by soft weight-sharing [PDF]
S.J. Nowlan, G.E. Hinton.
Neural computation, Vol 4(4), pp. 473—493. MIT Press. 1992.
8) Deconstructing Lottery Tickets: Zeros, Signs, and the Supermask [link]
H. Zhou, J. Lan, R. Liu, J. Yosinski.
arXiv preprint arXiv:1905.01067. 2019.
29

Weight Agnostic Neural Networks

Recommended

Recommended

More Related Content

What's hot

What's hot (14)

Similar to Weight Agnostic Neural Networks

Similar to Weight Agnostic Neural Networks (20)

Recently uploaded

Recently uploaded (20)

Weight Agnostic Neural Networks

Editor's Notes