Structure learning
with deep autoencoders
Network Modeling Seminar, 30/4/2013
Patrick Michl
Page 24/30/2013 |
Author
Department
Agenda
Autoencoders
Biological Model
Validation & Implementation
Page 34/30/2013 |
Author
Department
Real world data usually is high dimensional …
x1
x2
Dataset Model
Autoencoders
Page 44/30/2013 |
Author
Department
… which makes structural analysis and modeling complicated!
x1
x2
x1
x2
Dataset Model
𝐹(𝑥1, 𝑥2) ?
Autoencoders
Page 54/30/2013 |
Author
Department
Dimensionality reduction techinques like PCA …
x1
x2
PCA
Dataset Model
Autoencoders
Page 64/30/2013 |
Author
Department
… can not preserve complex structures!
x1
x2
PCA
Dataset Model
x1
x2
𝑥2 = α𝑥1 + β
Autoencoders
Page 74/30/2013 |
Author
Department
Therefore the analysis of unknown structures …
x1
x2
Dataset Model
Autoencoders
Page 84/30/2013 |
Author
Department
… needs more considerate nonlinear techniques!
x1
x2
Dataset Model
x1
x2
𝑥2 = 𝑓(𝑥1)
Autoencoders
Page 94/30/2013 |
Author
Department
Autoencoders are artificial neuronal networks …
Autoencoder
• Artificial Neuronal Network
Autoencoders
input data X
output data X‘
Perceptrons
Gaussian Units
Page 104/30/2013 |
Author
Department
Autoencoders are artificial neuronal networks …
Autoencoder
• Artificial Neuronal Network
Autoencoders
input data X
output data X‘
Perceptrons
Gaussian Units
Perceptron
1
0
Gauss Units
R
Page 114/30/2013 |
Author
Department
Autoencoders are artificial neuronal networks …
Autoencoder
• Artificial Neuronal Network
Autoencoders
input data X
output data X‘
Perceptrons
Gaussian Units
Page 124/30/2013 |
Author
Department
Autoencoder
• Artificial Neuronal Network
• Multiple hidden layers
Autoencoders
… with multiple hidden layers.
Gaussian Units
input data X
output data X‘
Perceptrons
(Visible layers)
(Hidden layers)
Page 134/30/2013 |
Author
Department
Autoencoder
• Artificial Neuronal Network
• Multiple hidden layers
Autoencoders
Such networks are called deep networks.
Gaussian Units
input data X
output data X‘
Perceptrons
(Visible layers)
(Hidden layers)
Page 144/30/2013 |
Author
Department
Autoencoder
• Artificial Neuronal Network
• Multiple hidden layers
Autoencoders
Such networks are called deep networks.
Gaussian Units
input data X
output data X‘
Perceptrons
(Visible layers)
(Hidden layers)Definition (deep network)
Deep networks are artificial neuronal
networks with multiple hidden layers
Page 154/30/2013 |
Author
Department
Autoencoder
Autoencoders
Gaussian Units
input data X
output data X‘
Perceptrons
(Visible layers)
(Hidden layers)
Such networks are called deep networks.
• Deep network
Page 164/30/2013 |
Author
Department
Autoencoder
Autoencoders
Autoencoders have a symmetric topology …
Gaussian Units
input data X
output data X‘
Perceptrons
(Visible layers)
(Hidden layers)
• Deep network
• Symmetric topology
Page 174/30/2013 |
Author
Department
Autoencoder
Autoencoders
… with an odd number of hidden layers.
Gaussian Units
input data X
output data X‘
Perceptrons
(Visible layers)
(Hidden layers)
• Deep network
• Symmetric topology
Page 184/30/2013 |
Author
Department
Autoencoder
Autoencoders
The small layer in the center works lika an information bottleneck
input data X
output data X‘
• Deep network
• Symmetric topology
• Information bottleneck
Bottleneck
Page 194/30/2013 |
Author
Department
Autoencoder
Autoencoders
... that creates a low dimensional code for each sample in the input data.
input data X
output data X‘
• Deep network
• Symmetric topology
• Information bottleneck
Bottleneck
Page 204/30/2013 |
Author
Department
Autoencoder
Autoencoders
The upper stack does the encoding …
input data X
output data X‘
• Deep network
• Symmetric topology
• Information bottleneck
• Encoder
Encoder
Page 214/30/2013 |
Author
Department
Autoencoder
Autoencoders
… and the lower stack does the decoding.
input data X
output data X‘
• Deep network
• Symmetric topology
• Information bottleneck
• Encoder
• Decoder
Encoder
Decoder
Page 224/30/2013 |
Author
Department
• Deep network
• Symmetric topology
• Information bottleneck
• Encoder
• Decoder
Autoencoder
Autoencoders
… and the lower stack does the decoding.
input data X
output data X‘
Encoder
Decoder
Definition (deep network)
Deep networks are artificial neuronal
networks with multiple hidden layers
Definition (autoencoder)
Autoencoders are deep networks with a symmetric
topology and an odd number of hiddern layers,
containing a encoder, a low dimensional
representation and a decoder.
Page 234/30/2013 |
Author
Department
Autoencoder
Autoencoders
Autoencoders can be used to reduce the dimension of data …
input data X
output data X‘
Problem: dimensionality of data
Idea:
1. Train autoencoder to minimize the distance
between input X and output X‘
2. Encode X to low dimensional code Y
3. Decode low dimensional code Y to output X‘
4. Output X‘ is low dimensional
Page 244/30/2013 |
Author
Department
Autoencoder
Autoencoders
… if we can train them!
input data X
output data X‘
Problem: dimensionality of data
Idea:
1. Train autoencoder to minimize the distance
between input X and output X‘
2. Encode X to low dimensional code Y
3. Decode low dimensional code Y to output X‘
4. Output X‘ is low dimensional
Page 254/30/2013 |
Author
Department
Autoencoder
Autoencoders
In feedforward ANNs backpropagation is a good approach.
input data X
output data X‘
Training
Backpropagation
Page 264/30/2013 |
Author
Department
Backpropagation
Autoencoder
Autoencoders
input data X
output data X‘
Training
Definition (autoencoder)
Backpropagation
(1) The distance (error) between current output X‘ and wanted output Y is
computed. This gives a error function
𝑋′
= 𝐹 𝑋
error = 𝑋′2 − 𝑌
In feedforward ANNs backpropagation is a good approach.
Page 274/30/2013 |
Author
Department
Backpropagation
Autoencoder
Autoencoders
In feedforward ANNs backpropagation is the choice
input data X
output data X‘
Training
Definition (autoencoder)
Backpropagation
(1) The distance (error) between current output X‘ and wanted output Y is
computed. This gives a error function
Example (linear neuronal unit with two inputs)
Page 284/30/2013 |
Author
Department
Backpropagation
Autoencoder
Autoencoders
input data X
output data X‘
Training
Definition (autoencoder)
Backpropagation
(1) The distance (error) between current output X‘ and wanted output Y is
computed. This gives a error function
(2) By calculating −∇𝑒𝑟𝑟𝑜𝑟 we get a vector that shows in a direction which
decreases the error
(3) We update the parameters to decrease the error
In feedforward ANNs backpropagation is a good approach.
Page 294/30/2013 |
Author
Department
Backpropagation
Autoencoder
Autoencoders
In feedforward ANNs backpropagation is the choice
input data X
output data X‘
Training
Definition (autoencoder)
Backpropagation
(1) The distance (error) between current output X‘ and wanted output Y is
computed. This gives a error function
(2) By calculating −∇𝑒𝑟𝑟𝑜𝑟 we get a vector that shows in a direction which
decreases the error
(3) We update the parameters to decrease the error
(4) We repeat that
Page 304/30/2013 |
Author
Department
Autoencoder
Autoencoders
… the problem are the multiple hidden layers!
input data X
output data X‘
Training
Backpropagation
Problem: Deep Network
Page 314/30/2013 |
Author
Department
Autoencoder
Autoencoders
input data X
output data X‘
Training
Backpropagation is known to be slow far away from the output layer …
Backpropagation
Problem: Deep Network
• Very slow training
Page 324/30/2013 |
Author
Department
Autoencoder
Autoencoders
input data X
output data X‘
Training
… and can converge to poor local minima.
Backpropagation
Problem: Deep Network
• Very slow training
• Maybe bad solution
Page 334/30/2013 |
Author
Department
Autoencoder
Autoencoders
input data X
output data X‘
Training
Backpropagation
Problem: Deep Network
• Very slow training
• Maybe bad solution
Idea: Initialize close to a good solution
The task is to initialize the parameters close to a good solution!
Page 344/30/2013 |
Author
Department
Autoencoder
Autoencoders
input data X
output data X‘
Training
Backpropagation
Problem: Deep Network
• Very slow training
• Maybe bad solution
Idea: Initialize close to a good solution
• Pretraining
Therefore the training of autoencoders has a pretraining phase …
Page 354/30/2013 |
Author
Department
Autoencoder
Autoencoders
input data X
output data X‘
Training
Backpropagation
Problem: Deep Network
• Very slow training
• Maybe bad solution
Idea: Initialize close to a good solution
• Pretraining
• Restricted Boltzmann Machines
… which uses Restricted Boltzmann Machines (RBMs)
Page 364/30/2013 |
Author
Department
Autoencoder
Autoencoders
input data X
output data X‘
Training
Backpropagation
Problem: Deep Network
• Very slow training
• Maybe bad solution
Idea: Initialize close to a good solution
• Pretraining
• Restricted Boltzmann Machines
… which uses Restricted Boltzmann Machines (RBMs)
Restricted Boltzmann Machine
• RBMs are Markov Random Fields
Page 374/30/2013 |
Author
Department
Autoencoder
Autoencoders
input data X
output data X‘
Training
Backpropagation
Problem: Deep Network
• Very slow training
• Maybe bad solution
Idea: Initialize close to a good solution
• Pretraining
• Restricted Boltzmann Machines
… which uses Restricted Boltzmann Machines (RBMs)
Restricted Boltzmann Machine
• RBMs are Markov Random Fields
Markov Random Field
Every unit influences every neighbor
The coupling is undirected
Motivation (Ising Model)
A set of magnetic dipoles (spins)
is arranged in a graph (lattice)
where neighbors are
coupled with a given strengt
Page 384/30/2013 |
Author
Department
Autoencoder
Autoencoders
input data X
output data X‘
Training
Backpropagation
Problem: Deep Network
• Very slow training
• Maybe bad solution
Idea: Initialize close to a good solution
• Pretraining
• Restricted Boltzmann Machines
… which uses Restricted Boltzmann Machines (RBMs)
Restricted Boltzmann Machine
• RBMs are Markov Random Fields
• Bipartite topology: visible (v), hidden (h)
• Use local energy to calculate the probabilities of values
Training:
contrastive divergency
(Gibbs Sampling)
h1
v1 v2 v3 v4
h2 h3
Page 394/30/2013 |
Author
Department
Autoencoder
Autoencoders
input data X
output data X‘
Training
Backpropagation
Problem: Deep Network
• Very slow training
• Maybe bad solution
Idea: Initialize close to a good solution
• Pretraining
• Restricted Boltzmann Machines
… which uses Restricted Boltzmann Machines (RBMs)
Restricted Boltzmann Machine
Gibbs Sampling
Page 404/30/2013 |
Author
Department
Autoencoders
Autoencoder
The top layer RBM transforms real value data into binary codes.
𝑉 ≔ set of visible units
𝑥 𝑣 ≔ value of unit 𝑣, ∀𝑣 ∈ 𝑉
𝑥 𝑣 ∈ 𝑹, ∀𝑣 ∈ 𝑉
𝐻 ≔ set of hidden units
𝑥ℎ ≔ value of unit ℎ, ∀ℎ ∈ 𝐻
𝑥ℎ ∈ {𝟎, 𝟏}, ∀ℎ ∈ 𝐻
Top
Training
Page 414/30/2013 |
Author
Department
Autoencoders
Autoencoder
Top
Therefore visible units are modeled with gaussians to encode data …
𝑥 𝑣~𝑁 𝑏 𝑣 + 𝑤 𝑣ℎ
ℎ
𝑥ℎ, 𝜎𝑣
𝜎𝑣 ≔ std. dev. of unit 𝑣
𝑏 𝑣 ≔ bias of unit 𝑣
𝑤 𝑣ℎ ≔ weight of edge (𝑣, ℎ)
h2
v1 v2 v3 v4
h3 h4 h5h1
Training
Page 424/30/2013 |
Author
Department
Autoencoders
Autoencoder
Top
… and many hidden units with simoids to encode dependencies
𝑥ℎ~sigm 𝑏ℎ + 𝑤 𝑣ℎ
𝑣
𝑥 𝑣
𝜎𝑣
𝜎𝑣 ≔ std. dev. of unit 𝑣
𝑏ℎ ≔ bias of unit ℎ
𝑤 𝑣ℎ ≔ weight of edge (𝑣, ℎ)
h2
v1 v2 v3 v4
h3 h4 h5h1
Training
Page 434/30/2013 |
Author
Department
Autoencoders
Autoencoder
Top
The objective function is the sum of the local energies.
Local Energy
𝐸ℎ ≔ − 𝑤 𝑣ℎ
𝑣
𝑥 𝑣
𝜎𝑣
𝑥ℎ + 𝑥ℎ 𝑏ℎ
𝐸 𝑣 ≔ − 𝑤 𝑣ℎ
ℎ
𝑥 𝑣
𝜎𝑣
𝑥ℎ +
𝑥 𝑣 − 𝑏 𝑣
2
2𝜎𝑣
2
h2
v1 v2 v3 v4
h3 h4 h5h1
Training
Page 444/30/2013 |
Author
Department
Autoencoders
Autoencoder
Reduction
𝑉 ≔ set of visible units
𝑥 𝑣 ≔ value of unit 𝑣, ∀𝑣 ∈ 𝑉
𝑥 𝑣 ∈ {𝟎, 𝟏}, ∀𝑣 ∈ 𝑉
𝐻 ≔ set of hidden units
𝑥ℎ ≔ value of unit ℎ, ∀ℎ ∈ 𝐻
𝑥ℎ ∈ {𝟎, 𝟏}, ∀ℎ ∈ 𝐻
The next RBM layer maps the dependency encoding…
Training
Page 454/30/2013 |
Author
Department
Autoencoders
Autoencoder
Reduction
… from the upper layer …
𝑥 𝑣~sigm 𝑏 𝑣 + 𝑤 𝑣ℎ
ℎ
𝑥ℎ
𝑏 𝑣 ≔ bias of unit v
𝑤 𝑣ℎ ≔ weight of edge (𝑣, ℎ)
h1
v1 v2 v3 v4
h2 h3
Training
Page 464/30/2013 |
Author
Department
Autoencoders
Autoencoder
Reduction
… to a smaller number of simoids …
𝑥ℎ~sigm 𝑏ℎ + 𝑤 𝑣ℎ
𝑣
𝑥 𝑣
𝑏ℎ ≔ bias of unit h
𝑤 𝑣ℎ ≔ weight of edge (𝑣, ℎ)
h1
v1 v2 v3 v4
h2 h3
Training
Page 474/30/2013 |
Author
Department
Autoencoders
Autoencoder
Reduction
… which can be trained faster than the top layer
Local Energy
𝐸 𝑣 ≔ − 𝑤 𝑣ℎ
ℎ
𝑥 𝑣 𝑥ℎ + 𝑥ℎ 𝑏ℎ
𝐸ℎ ≔ − 𝑤 𝑣ℎ
𝑣
𝑥 𝑣 𝑥ℎ + 𝑥 𝑣 𝑏 𝑣
h1
v1 v2 v3 v4
h2 h3
Training
Page 484/30/2013 |
Author
Department
Autoencoders
Autoencoder
Unrolling
The symmetric topology allows us to skip further training.
Training
Page 494/30/2013 |
Author
Department
Autoencoders
Autoencoder
Unrolling
The symmetric topology allows us to skip further training.
Training
Page 504/30/2013 |
Author
Department
After pretraining backpropagation usually finds good solutions
Autoencoders
Autoencoder
Training
• Pretraining
Top RBM (GRBM)
Reduction RBMs
Unrolling
• Finetuning
Backpropagation
Page 514/30/2013 |
Author
Department
The algorithmic complexity of RBM training depends on the network size
Autoencoders
Autoencoder
Training
• Complexity: O(inw)
i: number of iterations
n: number of nodes
w: number of weights
• Memory Complexity: O(w)
Page 524/30/2013 |
Author
Department
Agenda
Autoencoders
Biological Model
Validation & Implementation
Page 534/30/2013 |
Author
Department Network Modeling
Restricted Boltzmann Machines (RBM)
How to model the topological structure?
S
E
TF
Page 544/30/2013 |
Author
Department
We define S and E as visible data Layer …
S
E
TF
Network Modeling
Restricted Boltzmann Machines (RBM)
Page 554/30/2013 |
Author
Department
S E
TF
Network Modeling
Restricted Boltzmann Machines (RBM)
We identify S and E with the visible layer …
Page 564/30/2013 |
Author
Department
S E
… and the TFs with the hidden layer in a RBM
TF
Network Modeling
Restricted Boltzmann Machines (RBM)
Page 574/30/2013 |
Author
Department
S E
The training of the RBM gives us a model
TF
Network Modeling
Restricted Boltzmann Machines (RBM)
Page 584/30/2013 |
Author
Department
Agenda
Autoencoder
Biological Model
Implementation & Results
Page 594/30/2013 |
Author
Department
Results
Validation of the results
• Needs information about the true regulation
• Needs information about the descriptive power of the data
Page 604/30/2013 |
Author
Department
Results
Validation of the results
• Needs information about the true regulation
• Needs information about the descriptive power of the data
Without this infomation validation can only be done,
using artificial datasets!
Page 614/30/2013 |
Author
Department
Results
Artificial datasets
We simulate data in three steps:
Page 624/30/2013 |
Author
Department
Results
Artificial datasets
We simulate data in three steps
Step 1
Choose number of Genes (E+S) and create random bimodal distributed
data
Page 634/30/2013 |
Author
Department
Results
Artificial datasets
We simulate data in three steps
Step 1
Choose number of Genes (E+S) and create random bimodal distributed
data
Step 2
Manipulate data in a fixed order
Page 644/30/2013 |
Author
Department
Results
Artificial datasets
We simulate data in three steps
Step 1
Choose number of Genes (E+S) and create random bimodal distributed
data
Step 2
Manipulate data in a fixed order
Step 3
Add noise to manipulated data
and normalize data
Page 654/30/2013 |
Author
Department
Simulation
Results
Step 1
Number of visible nodes 8 (4E, 4S)
Create random data:
Random {-1, +1} + N(0, 𝜎 = 0.5)
Page 664/30/2013 |
Author
Department
Simulation
Results
𝑒1 = 0.25𝑠1 + 0.25𝑠2 + 0.25𝑠3 + 0.25𝑠4
𝑒2 = 0.5𝑠1 + 0.5 Noise
𝑒3 = 0.5𝑠1 + 0.5 𝑁𝑜𝑖𝑠𝑒4
𝑒4 = 0.5𝑠1 + 0.5 𝑁𝑜𝑖𝑠𝑒
Step 2
Manipulate data
Page 674/30/2013 |
Author
Department
Simulation
Results
Step 3
Add noise: N(0, 𝜎 = 0.5)
Page 684/30/2013 |
Author
Department
Results
We analyse the data X
with an RBM
Page 694/30/2013 |
Author
Department
Results
We train an autoencoder with 9 hidden layers
and 165 nodes:
Layer 1 & 9: 32 hidden units
Layer 2 & 8: 24 hidden units
Layer 3 & 7: 16 hidden units
Layer 4 & 6: 8 hidden units
Layer 5: 5 hidden units
input data X
output data X‘
Page 704/30/2013 |
Author
Department
Results
We transform the data from X to X‘
And reduce the dimensionality
Page 714/30/2013 |
Author
Department
Results
We analyse the
transformed data X‘
with an RBM
Page 724/30/2013 |
Author
Department
Results
Lets compare the models
Page 734/30/2013 |
Author
Department
Results
Another Example with more nodes and larger autoencoder
Page 744/30/2013 |
Author
Department
Conclusion
Conclusion
• Autoencoders can improve modeling significantly by reducing the
dimensionality of data
• Autoencoders preserve complex structures in their multilayer
perceptron network. Analysing those networks (for example with
knockout tests) could give more structural information
• The drawback are high computational costs
Since the field of deep learning is getting more popular (Face
recognition / Voice recognition, Image transformation). Many new
improvements in facing the computational costs have been made.
Page 754/30/2013 |
Author
Department
Acknowledgement
eilsLABS
PD Dr. Rainer König
Prof. Dr Roland Eils
Network Modeling Group

Structure learning with Deep Autoencoders