Structure learning with Deep Autoencoders

Structure learning
with deep autoencoders
Network Modeling Seminar, 30/4/2013
Patrick Michl

/30/2013 |
Author
Department
Agenda
Autoencoders
Biological Model
Validation & Implementation

/30/2013 |
Author
Department
Real world data usually is high dimensional …
x1
x2
Dataset Model
Autoencoders

/30/2013 |
Author
Department
… which makes structural analysis and modeling complicated!
x1
x2
x1
x2
Dataset Model
𝐹(𝑥1, 𝑥2) ?
Autoencoders

/30/2013 |
Author
Department
Dimensionality reduction techinques like PCA …
x1
x2
PCA
Dataset Model
Autoencoders

/30/2013 |
Author
Department
… can not preserve complex structures!
x1
x2
PCA
Dataset Model
x1
x2
𝑥2 = α𝑥1 + β
Autoencoders

/30/2013 |
Author
Department
Therefore the analysis of unknown structures …
x1
x2
Dataset Model
Autoencoders

/30/2013 |
Author
Department
… needs more considerate nonlinear techniques!
x1
x2
Dataset Model
x1
x2
𝑥2 = 𝑓(𝑥1)
Autoencoders

/30/2013 |
Author
Department
Autoencoders are artificial neuronal networks …
Autoencoder
• Artificial Neuronal Network
Autoencoders
input data X
output data X‘
Perceptrons
Gaussian Units

/30/2013 |
Author
Department
Autoencoders are artificial neuronal networks …
Autoencoder
Autoencoders
input data X
output data X‘
Perceptrons
Gaussian Units
Perceptron
1
0
Gauss Units
R

/30/2013 |
Author
Department
Autoencoder
• Multiple hidden layers
Autoencoders
… with multiple hidden layers.
Gaussian Units
input data X
output data X‘
Perceptrons
(Visible layers)
(Hidden layers)

/30/2013 |
Author
Department
Autoencoder
Autoencoders
Such networks are called deep networks.
Gaussian Units
input data X
output data X‘
Perceptrons
(Visible layers)
(Hidden layers)

/30/2013 |
Author
Department
Autoencoder
Autoencoders
Gaussian Units
input data X
output data X‘
Perceptrons
(Visible layers)
(Hidden layers)Definition (deep network)
Deep networks are artificial neuronal
networks with multiple hidden layers

/30/2013 |
Author
Department
Autoencoder
Autoencoders
Gaussian Units
input data X
output data X‘
Perceptrons
(Visible layers)
(Hidden layers)
• Deep network

/30/2013 |
Author
Department
Autoencoder
Autoencoders
Autoencoders have a symmetric topology …
Gaussian Units
input data X
output data X‘
Perceptrons
(Visible layers)
(Hidden layers)
• Deep network
• Symmetric topology

/30/2013 |
Author
Department
Autoencoder
Autoencoders
… with an odd number of hidden layers.
Gaussian Units
input data X
output data X‘
Perceptrons
(Visible layers)
(Hidden layers)
• Deep network

/30/2013 |
Author
Department
Autoencoder
Autoencoders
The small layer in the center works lika an information bottleneck
input data X
output data X‘
• Deep network
• Information bottleneck
Bottleneck

/30/2013 |
Author
Department
Autoencoder
Autoencoders
... that creates a low dimensional code for each sample in the input data.
input data X
output data X‘
• Deep network
Bottleneck

/30/2013 |
Author
Department
Autoencoder
Autoencoders
The upper stack does the encoding …
input data X
output data X‘
• Deep network
• Encoder
Encoder

/30/2013 |
Author
Department
Autoencoder
Autoencoders
… and the lower stack does the decoding.
input data X
output data X‘
• Deep network
• Encoder
• Decoder
Encoder
Decoder

/30/2013 |
Author
Department
• Deep network
• Encoder
• Decoder
Autoencoder
Autoencoders
… and the lower stack does the decoding.
input data X
output data X‘
Encoder
Decoder
Definition (deep network)
Deep networks are artificial neuronal
networks with multiple hidden layers
Definition (autoencoder)
Autoencoders are deep networks with a symmetric
topology and an odd number of hiddern layers,
containing a encoder, a low dimensional
representation and a decoder.

/30/2013 |
Author
Department
Autoencoder
Autoencoders
Autoencoders can be used to reduce the dimension of data …
input data X
output data X‘
Problem: dimensionality of data
Idea:
1. Train autoencoder to minimize the distance
between input X and output X‘
2. Encode X to low dimensional code Y
3. Decode low dimensional code Y to output X‘
4. Output X‘ is low dimensional

/30/2013 |
Author
Department
Autoencoder
Autoencoders
… if we can train them!
input data X
output data X‘
Problem: dimensionality of data
Idea:
1. Train autoencoder to minimize the distance
between input X and output X‘
2. Encode X to low dimensional code Y
3. Decode low dimensional code Y to output X‘
4. Output X‘ is low dimensional

/30/2013 |
Author
Department
Autoencoder
Autoencoders
In feedforward ANNs backpropagation is a good approach.
input data X
output data X‘
Training
Backpropagation

/30/2013 |
Author
Department
Backpropagation
Autoencoder
Autoencoders
input data X
output data X‘
Training
Backpropagation
(1) The distance (error) between current output X‘ and wanted output Y is
computed. This gives a error function
𝑋′
= 𝐹 𝑋
error = 𝑋′2 − 𝑌

/30/2013 |
Author
Department
Backpropagation
Autoencoder
Autoencoders
In feedforward ANNs backpropagation is the choice
input data X
output data X‘
Training
Backpropagation
Example (linear neuronal unit with two inputs)

/30/2013 |
Author
Department
Backpropagation
Autoencoder
Autoencoders
input data X
output data X‘
Training
Backpropagation
(2) By calculating −∇𝑒𝑟𝑟𝑜𝑟 we get a vector that shows in a direction which
decreases the error
(3) We update the parameters to decrease the error

/30/2013 |
Author
Department
Backpropagation
Autoencoder
Autoencoders
In feedforward ANNs backpropagation is the choice
input data X
output data X‘
Training
Backpropagation
(2) By calculating −∇𝑒𝑟𝑟𝑜𝑟 we get a vector that shows in a direction which
decreases the error
(3) We update the parameters to decrease the error
(4) We repeat that

/30/2013 |
Author
Department
Autoencoder
Autoencoders
… the problem are the multiple hidden layers!
input data X
output data X‘
Training
Backpropagation
Problem: Deep Network

/30/2013 |
Author
Department
Autoencoder
Autoencoders
input data X
output data X‘
Training
Backpropagation is known to be slow far away from the output layer …
Backpropagation
• Very slow training

/30/2013 |
Author
Department
Autoencoder
Autoencoders
input data X
output data X‘
Training
… and can converge to poor local minima.
Backpropagation
• Maybe bad solution

/30/2013 |
Author
Department
Autoencoder
Autoencoders
input data X
output data X‘
Training
Backpropagation
Idea: Initialize close to a good solution
The task is to initialize the parameters close to a good solution!

/30/2013 |
Author
Department
Autoencoder
Autoencoders
input data X
output data X‘
Training
Backpropagation
• Pretraining
Therefore the training of autoencoders has a pretraining phase …

/30/2013 |
Author
Department
Autoencoder
Autoencoders
input data X
output data X‘
Training
Backpropagation
• Pretraining
• Restricted Boltzmann Machines
… which uses Restricted Boltzmann Machines (RBMs)

/30/2013 |
Author
Department
Autoencoder
Autoencoders
input data X
output data X‘
Training
Backpropagation
• Pretraining
Restricted Boltzmann Machine
• RBMs are Markov Random Fields

/30/2013 |
Author
Department
Autoencoder
Autoencoders
input data X
output data X‘
Training
Backpropagation
• Pretraining
Markov Random Field
Every unit influences every neighbor
The coupling is undirected
Motivation (Ising Model)
A set of magnetic dipoles (spins)
is arranged in a graph (lattice)
where neighbors are
coupled with a given strengt

/30/2013 |
Author
Department
Autoencoder
Autoencoders
input data X
output data X‘
Training
Backpropagation
• Pretraining
• Bipartite topology: visible (v), hidden (h)
• Use local energy to calculate the probabilities of values
Training:
contrastive divergency
(Gibbs Sampling)
h1
v1 v2 v3 v4
h2 h3

/30/2013 |
Author
Department
Autoencoder
Autoencoders
input data X
output data X‘
Training
Backpropagation
• Pretraining
Gibbs Sampling

/30/2013 |
Author
Department
Autoencoders
Autoencoder
The top layer RBM transforms real value data into binary codes.
𝑉 ≔ set of visible units
𝑥 𝑣 ≔ value of unit 𝑣, ∀𝑣 ∈ 𝑉
𝑥 𝑣 ∈ 𝑹, ∀𝑣 ∈ 𝑉
𝐻 ≔ set of hidden units
𝑥ℎ ≔ value of unit ℎ, ∀ℎ ∈ 𝐻
𝑥ℎ ∈ {𝟎, 𝟏}, ∀ℎ ∈ 𝐻
Top
Training

/30/2013 |
Author
Department
Autoencoders
Autoencoder
Top
Therefore visible units are modeled with gaussians to encode data …
𝑥 𝑣~𝑁 𝑏 𝑣 + 𝑤 𝑣ℎ
ℎ
𝑥ℎ, 𝜎𝑣
𝜎𝑣 ≔ std. dev. of unit 𝑣
𝑏 𝑣 ≔ bias of unit 𝑣
𝑤 𝑣ℎ ≔ weight of edge (𝑣, ℎ)
h2
v1 v2 v3 v4
h3 h4 h5h1
Training

/30/2013 |
Author
Department
Autoencoders
Autoencoder
Top
… and many hidden units with simoids to encode dependencies
𝑥ℎ~sigm 𝑏ℎ + 𝑤 𝑣ℎ
𝑣
𝑥 𝑣
𝜎𝑣
𝜎𝑣 ≔ std. dev. of unit 𝑣
𝑏ℎ ≔ bias of unit ℎ
h2
v1 v2 v3 v4
h3 h4 h5h1
Training

/30/2013 |
Author
Department
Autoencoders
Autoencoder
Top
The objective function is the sum of the local energies.
Local Energy
𝐸ℎ ≔ − 𝑤 𝑣ℎ
𝑣
𝑥 𝑣
𝜎𝑣
𝑥ℎ + 𝑥ℎ 𝑏ℎ
𝐸 𝑣 ≔ − 𝑤 𝑣ℎ
ℎ
𝑥 𝑣
𝜎𝑣
𝑥ℎ +
𝑥 𝑣 − 𝑏 𝑣
2
2𝜎𝑣
2
h2
v1 v2 v3 v4
h3 h4 h5h1
Training

/30/2013 |
Author
Department
Autoencoders
Autoencoder
Reduction
𝑉 ≔ set of visible units
𝑥 𝑣 ≔ value of unit 𝑣, ∀𝑣 ∈ 𝑉
𝑥 𝑣 ∈ {𝟎, 𝟏}, ∀𝑣 ∈ 𝑉
𝐻 ≔ set of hidden units
𝑥ℎ ≔ value of unit ℎ, ∀ℎ ∈ 𝐻
𝑥ℎ ∈ {𝟎, 𝟏}, ∀ℎ ∈ 𝐻
The next RBM layer maps the dependency encoding…
Training

/30/2013 |
Author
Department
Autoencoders
Autoencoder
Reduction
… from the upper layer …
𝑥 𝑣~sigm 𝑏 𝑣 + 𝑤 𝑣ℎ
ℎ
𝑥ℎ
𝑏 𝑣 ≔ bias of unit v
h1
v1 v2 v3 v4
h2 h3
Training

/30/2013 |
Author
Department
Autoencoders
Autoencoder
Reduction
… to a smaller number of simoids …
𝑥ℎ~sigm 𝑏ℎ + 𝑤 𝑣ℎ
𝑣
𝑥 𝑣
𝑏ℎ ≔ bias of unit h
h1
v1 v2 v3 v4
h2 h3
Training

/30/2013 |
Author
Department
Autoencoders
Autoencoder
Reduction
… which can be trained faster than the top layer
Local Energy
𝐸 𝑣 ≔ − 𝑤 𝑣ℎ
ℎ
𝑥 𝑣 𝑥ℎ + 𝑥ℎ 𝑏ℎ
𝐸ℎ ≔ − 𝑤 𝑣ℎ
𝑣
𝑥 𝑣 𝑥ℎ + 𝑥 𝑣 𝑏 𝑣
h1
v1 v2 v3 v4
h2 h3
Training

/30/2013 |
Author
Department
Autoencoders
Autoencoder
Unrolling
The symmetric topology allows us to skip further training.
Training

/30/2013 |
Author
Department
After pretraining backpropagation usually finds good solutions
Autoencoders
Autoencoder
Training
• Pretraining
Top RBM (GRBM)
Reduction RBMs
Unrolling
• Finetuning
Backpropagation

/30/2013 |
Author
Department
The algorithmic complexity of RBM training depends on the network size
Autoencoders
Autoencoder
Training
• Complexity: O(inw)
i: number of iterations
n: number of nodes
w: number of weights
• Memory Complexity: O(w)

/30/2013 |
Author
Department Network Modeling
Restricted Boltzmann Machines (RBM)
How to model the topological structure?
S
E
TF

/30/2013 |
Author
Department
We define S and E as visible data Layer …
S
E
TF
Network Modeling

/30/2013 |
Author
Department
S E
TF
Network Modeling
We identify S and E with the visible layer …

/30/2013 |
Author
Department
S E
… and the TFs with the hidden layer in a RBM
TF
Network Modeling

/30/2013 |
Author
Department
S E
The training of the RBM gives us a model
TF
Network Modeling

/30/2013 |
Author
Department
Agenda
Autoencoder
Biological Model
Implementation & Results

/30/2013 |
Author
Department
Results
Validation of the results
• Needs information about the true regulation
• Needs information about the descriptive power of the data

/30/2013 |
Author
Department
Results
Validation of the results
• Needs information about the true regulation
• Needs information about the descriptive power of the data
Without this infomation validation can only be done,
using artificial datasets!

/30/2013 |
Author
Department
Results
Artificial datasets
We simulate data in three steps:

/30/2013 |
Author
Department
Results
Artificial datasets
We simulate data in three steps
Step 1
Choose number of Genes (E+S) and create random bimodal distributed
data

/30/2013 |
Author
Department
Results
Artificial datasets
Step 1
data
Step 2
Manipulate data in a fixed order

/30/2013 |
Author
Department
Results
Artificial datasets
Step 1
data
Step 2
Manipulate data in a fixed order
Step 3
Add noise to manipulated data
and normalize data

/30/2013 |
Author
Department
Simulation
Results
Step 1
Number of visible nodes 8 (4E, 4S)
Create random data:
Random {-1, +1} + N(0, 𝜎 = 0.5)

/30/2013 |
Author
Department
Simulation
Results
𝑒1 = 0.25𝑠1 + 0.25𝑠2 + 0.25𝑠3 + 0.25𝑠4
𝑒2 = 0.5𝑠1 + 0.5 Noise
𝑒3 = 0.5𝑠1 + 0.5 𝑁𝑜𝑖𝑠𝑒4
𝑒4 = 0.5𝑠1 + 0.5 𝑁𝑜𝑖𝑠𝑒
Step 2
Manipulate data

/30/2013 |
Author
Department
Simulation
Results
Step 3
Add noise: N(0, 𝜎 = 0.5)

/30/2013 |
Author
Department
Results
We analyse the data X
with an RBM

/30/2013 |
Author
Department
Results
We train an autoencoder with 9 hidden layers
and 165 nodes:
Layer 1 & 9: 32 hidden units
Layer 5: 5 hidden units
input data X
output data X‘

/30/2013 |
Author
Department
Results
We transform the data from X to X‘
And reduce the dimensionality

/30/2013 |
Author
Department
Results
We analyse the
transformed data X‘
with an RBM

/30/2013 |
Author
Department
Results
Lets compare the models

/30/2013 |
Author
Department
Results
Another Example with more nodes and larger autoencoder

/30/2013 |
Author
Department
Conclusion
Conclusion
• Autoencoders can improve modeling significantly by reducing the
dimensionality of data
• Autoencoders preserve complex structures in their multilayer
perceptron network. Analysing those networks (for example with
knockout tests) could give more structural information
• The drawback are high computational costs
Since the field of deep learning is getting more popular (Face
recognition / Voice recognition, Image transformation). Many new
improvements in facing the computational costs have been made.

/30/2013 |
Author
Department
Acknowledgement
eilsLABS
PD Dr. Rainer König
Prof. Dr Roland Eils
Network Modeling Group

Structure learning with Deep Autoencoders

More Related Content

Similar to Structure learning with Deep Autoencoders

More from Patrick Michl

Recently uploaded

Structure learning with Deep Autoencoders