Talk about highly nonlinear association analysis in gene expression data by using deep neural networks. Given at 7th Network Modeling Workshop, Heidelberg, Germany, 2014
9. Page 928.02.2014
Patrick Michl
Network Modeling
Autoencoders are artificial neuronal networks …
Autoencoder
• Artificial Neuronal Network
Autoencoders
input data X
output data X‘
Perceptrons
Gaussian Units
10. Page 1028.02.2014
Patrick Michl
Network Modeling
Autoencoders are artificial neuronal networks …
Autoencoder
• Artificial Neuronal Network
Autoencoders
input data X
output data X‘
Perceptrons
Gaussian Units
Perceptron
1
0
Gauss Units
R
11. Page 1128.02.2014
Patrick Michl
Network Modeling
Autoencoders are artificial neuronal networks …
Autoencoder
• Artificial Neuronal Network
Autoencoders
input data X
output data X‘
Perceptrons
Gaussian Units
12. Page 1228.02.2014
Patrick Michl
Network Modeling
Autoencoder
• Artificial Neuronal Network
• Multiple hidden layers
Autoencoders
… with multiple hidden layers.
Gaussian Units
input data X
output data X‘
Perceptrons
(Visible layers)
(Hidden layers)
13. Page 1328.02.2014
Patrick Michl
Network Modeling
Autoencoder
• Artificial Neuronal Network
• Multiple hidden layers
Autoencoders
Such networks are called deep networks.
Gaussian Units
input data X
output data X‘
Perceptrons
(Visible layers)
(Hidden layers)
14. Page 1428.02.2014
Patrick Michl
Network Modeling
Autoencoder
• Artificial Neuronal Network
• Multiple hidden layers
Autoencoders
Such networks are called deep networks.
Gaussian Units
input data X
output data X‘
Perceptrons
(Visible layers)
(Hidden layers)Definition (deep network)
Deep networks are artificial neuronal
networks with multiple hidden layers
15. Page 1528.02.2014
Patrick Michl
Network Modeling
Autoencoder
Autoencoders
Gaussian Units
input data X
output data X‘
Perceptrons
(Visible layers)
(Hidden layers)
Such networks are called deep networks.
• Deep network
16. Page 1628.02.2014
Patrick Michl
Network Modeling
Autoencoder
Autoencoders
Autoencoders have a symmetric topology …
Gaussian Units
input data X
output data X‘
Perceptrons
(Visible layers)
(Hidden layers)
• Deep network
• Symmetric topology
17. Page 1728.02.2014
Patrick Michl
Network Modeling
Autoencoder
Autoencoders
… with an odd number of hidden layers.
Gaussian Units
input data X
output data X‘
Perceptrons
(Visible layers)
(Hidden layers)
• Deep network
• Symmetric topology
18. Page 1828.02.2014
Patrick Michl
Network Modeling
Autoencoder
Autoencoders
The small layer in the center works lika an information bottleneck
input data X
output data X‘
• Deep network
• Symmetric topology
• Information bottleneck
Bottleneck
19. Page 1928.02.2014
Patrick Michl
Network Modeling
Autoencoder
Autoencoders
... that creates a low dimensional code for each sample in the input data.
input data X
output data X‘
• Deep network
• Symmetric topology
• Information bottleneck
Bottleneck
20. Page 2028.02.2014
Patrick Michl
Network Modeling
Autoencoder
Autoencoders
The upper stack does the encoding …
input data X
output data X‘
• Deep network
• Symmetric topology
• Information bottleneck
• Encoder
Encoder
21. Page 2128.02.2014
Patrick Michl
Network Modeling
Autoencoder
Autoencoders
… and the lower stack does the decoding.
input data X
output data X‘
• Deep network
• Symmetric topology
• Information bottleneck
• Encoder
• Decoder
Encoder
Decoder
22. Page 2228.02.2014
Patrick Michl
Network Modeling
• Deep network
• Symmetric topology
• Information bottleneck
• Encoder
• Decoder
Autoencoder
Autoencoders
… and the lower stack does the decoding.
input data X
output data X‘
Encoder
Decoder
Definition (deep network)
Deep networks are artificial neuronal
networks with multiple hidden layers
Definition (autoencoder)
Autoencoders are deep networks with a symmetric
topology and an odd number of hiddern layers,
containing a encoder, a low dimensional
representation and a decoder.
23. Page 2328.02.2014
Patrick Michl
Network Modeling
Autoencoder
Autoencoders
Autoencoders can be used to reduce the dimension of data …
input data X
output data X‘
Problem: dimensionality of data
Idea:
1. Train autoencoder to minimize the distance
between input X and output X‘
2. Encode X to low dimensional code Y
3. Decode low dimensional code Y to output X‘
4. Output X‘ is low dimensional
24. Page 2428.02.2014
Patrick Michl
Network Modeling
Autoencoder
Autoencoders
… if we can train them!
input data X
output data X‘
Problem: dimensionality of data
Idea:
1. Train autoencoder to minimize the distance
between input X and output X‘
2. Encode X to low dimensional code Y
3. Decode low dimensional code Y to output X‘
4. Output X‘ is low dimensional
25. Page 2528.02.2014
Patrick Michl
Network Modeling
Autoencoder
Autoencoders
In feedforward ANNs backpropagation is a good approach.
input data X
output data X‘
Training
Backpropagation
26. Page 2628.02.2014
Patrick Michl
Network Modeling
Backpropagation
Autoencoder
Autoencoders
input data X
output data X‘
Training
Definition (autoencoder)
Backpropagation
(1) The distance (error) between current output X‘ and wanted output Y is
computed. This gives a error function
error
In feedforward ANNs backpropagation is a good approach.
27. Page 2728.02.2014
Patrick Michl
Network Modeling
Backpropagation
Autoencoder
Autoencoders
In feedforward ANNs backpropagation is the choice
input data X
output data X‘
Training
Definition (autoencoder)
Backpropagation
(1) The distance (error) between current output X‘ and wanted output Y is
computed. This gives a error function
Example (linear neuronal unit with two inputs)
28. Page 2828.02.2014
Patrick Michl
Network Modeling
Backpropagation
Autoencoder
Autoencoders
input data X
output data X‘
Training
Definition (autoencoder)
Backpropagation
(1) The distance (error) between current output X‘ and wanted output Y is
computed. This gives a error function
(2) By calculating we get a vector that shows in a direction which decreases
the error
(3) We update the parameters to decrease the error
In feedforward ANNs backpropagation is a good approach.
29. Page 2928.02.2014
Patrick Michl
Network Modeling
Backpropagation
Autoencoder
Autoencoders
In feedforward ANNs backpropagation is the choice
input data X
output data X‘
Training
Definition (autoencoder)
Backpropagation
(1) The distance (error) between current output X‘ and wanted output Y is
computed. This gives a error function
(2) By calculating we get a vector that shows in a direction which decreases
the error
(3) We update the parameters to decrease the error
(4) We repeat that
30. Page 3028.02.2014
Patrick Michl
Network Modeling
Autoencoder
Autoencoders
… the problem are the multiple hidden layers!
input data X
output data X‘
Training
Backpropagation
Problem: Deep Network
31. Page 3128.02.2014
Patrick Michl
Network Modeling
Autoencoder
Autoencoders
input data X
output data X‘
Training
Backpropagation is known to be slow far away from the output layer …
Backpropagation
Problem: Deep Network
• Very slow training
32. Page 3228.02.2014
Patrick Michl
Network Modeling
Autoencoder
Autoencoders
input data X
output data X‘
Training
… and can converge to poor local minima.
Backpropagation
Problem: Deep Network
• Very slow training
• Maybe bad solution
33. Page 3328.02.2014
Patrick Michl
Network Modeling
Autoencoder
Autoencoders
input data X
output data X‘
Training
Backpropagation
Problem: Deep Network
• Very slow training
• Maybe bad solution
Idea: Initialize close to a good solution
The task is to initialize the parameters close to a good solution!
34. Page 3428.02.2014
Patrick Michl
Network Modeling
Autoencoder
Autoencoders
input data X
output data X‘
Training
Backpropagation
Problem: Deep Network
• Very slow training
• Maybe bad solution
Idea: Initialize close to a good solution
• Pretraining
Therefore the training of autoencoders has a pretraining phase …
35. Page 3528.02.2014
Patrick Michl
Network Modeling
Autoencoder
Autoencoders
input data X
output data X‘
Training
Backpropagation
Problem: Deep Network
• Very slow training
• Maybe bad solution
Idea: Initialize close to a good solution
• Pretraining
• Restricted Boltzmann Machines
… which uses Restricted Boltzmann Machines (RBMs)
36. Page 3628.02.2014
Patrick Michl
Network Modeling
Autoencoder
Autoencoders
input data X
output data X‘
Training
Backpropagation
Problem: Deep Network
• Very slow training
• Maybe bad solution
Idea: Initialize close to a good solution
• Pretraining
• Restricted Boltzmann Machines
… which uses Restricted Boltzmann Machines (RBMs)
Restricted Boltzmann Machine
• RBMs are Markov Random Fields
37. Page 3728.02.2014
Patrick Michl
Network Modeling
Autoencoder
Autoencoders
input data X
output data X‘
Training
Backpropagation
Problem: Deep Network
• Very slow training
• Maybe bad solution
Idea: Initialize close to a good solution
• Pretraining
• Restricted Boltzmann Machines
… which uses Restricted Boltzmann Machines (RBMs)
Restricted Boltzmann Machine
• RBMs are Markov Random Fields
Markov Random Field
Every unit influences every neighbor
The coupling is undirected
Motivation (Ising Model)
A set of magnetic dipoles (spins)
is arranged in a graph (lattice)
where neighbors are
coupled with a given strengt
38. Page 3828.02.2014
Patrick Michl
Network Modeling
Autoencoder
Autoencoders
input data X
output data X‘
Training
Backpropagation
Problem: Deep Network
• Very slow training
• Maybe bad solution
Idea: Initialize close to a good solution
• Pretraining
• Restricted Boltzmann Machines
… which uses Restricted Boltzmann Machines (RBMs)
Restricted Boltzmann Machine
• RBMs are Markov Random Fields
• Bipartite topology: visible (v), hidden (h)
• Use local energy to calculate the probabilities of values
Training:
contrastive divergency
(Gibbs Sampling)
h1
v1 v2
v3 v4
h2 h3
39. Page 3928.02.2014
Patrick Michl
Network Modeling
Autoencoder
Autoencoders
input data X
output data X‘
Training
Backpropagation
Problem: Deep Network
• Very slow training
• Maybe bad solution
Idea: Initialize close to a good solution
• Pretraining
• Restricted Boltzmann Machines
… which uses Restricted Boltzmann Machines (RBMs)
Restricted Boltzmann Machine
Gibbs Sampling
41. Page 4128.02.2014
Patrick Michl
Network Modeling
Autoencoders
Autoencoder
Top
Therefore visible units are modeled with gaussians to encode data …
h2
v1 v2
v3 v4
h3 h4 h5h1
Training
42. Page 4228.02.2014
Patrick Michl
Network Modeling
Autoencoders
Autoencoder
Top
… and many hidden units with simoids to encode dependencies
h2
v1 v2
v3 v4
h3 h4 h5h1
Training
43. Page 4328.02.2014
Patrick Michl
Network Modeling
Autoencoders
Autoencoder
Top
The objective function is the sum of the local energies.
Local Energy
𝐸 𝑣 ≔ −∑
h
𝑤 𝑣h
𝑥 𝑣
𝜎 𝑣
𝑥
h
+
( 𝑥 𝑣 − 𝑏 𝑣 )2
2 𝜎 𝑣
2
h2
v1 v2
v3 v4
h3 h4 h5h1
Training
47. Page 4728.02.2014
Patrick Michl
Network Modeling
Autoencoders
Autoencoder
Reduction
… which can be trained faster than the top layer
Local Energy
𝐸𝑣 ≔−∑
h
𝑤 𝑣h 𝑥 𝑣 𝑥h+𝑥h 𝑏h
𝐸h ≔−∑
𝑣
𝑤 𝑣h 𝑥 𝑣 𝑥h+𝑥 𝑣 𝑏 𝑣
h1
v1 v2
v3 v4
h2 h3
Training
50. Page 5028.02.2014
Patrick Michl
Network Modeling
After pretraining backpropagation usually finds good solutions
Autoencoders
Autoencoder
Training
• Pretraining
Top RBM (GRBM)
Reduction RBMs
Unrolling
• Finetuning
Backpropagation
51. Page 5128.02.2014
Patrick Michl
Network Modeling
The algorithmic complexity of RBM training depends on the network size
Autoencoders
Autoencoder
Training
• Complexity: O(inw)
i: number of iterations
n: number of nodes
w: number of weights
• Memory Complexity: O(w)
59. Page 5928.02.2014
Patrick Michl
Network Modeling
Results
Validation of the results
• Needs information about the true regulation
• Needs information about the descriptive power of the data
60. Page 6028.02.2014
Patrick Michl
Network Modeling
Results
Validation of the results
• Needs information about the true regulation
• Needs information about the descriptive power of the data
Without this infomation validation can only be done,
using artificial datasets!
62. Page 6228.02.2014
Patrick Michl
Network Modeling
Results
Artificial datasets
We simulate data in three steps
Step 1
Choose number of Genes (E+S) and create random bimodal distributed
data
63. Page 6328.02.2014
Patrick Michl
Network Modeling
Results
Artificial datasets
We simulate data in three steps
Step 1
Choose number of Genes (E+S) and create random bimodal distributed
data
Step 2
Manipulate data in a fixed order
64. Page 6428.02.2014
Patrick Michl
Network Modeling
Results
Artificial datasets
We simulate data in three steps
Step 1
Choose number of Genes (E+S) and create random bimodal distributed
data
Step 2
Manipulate data in a fixed order
Step 3
Add noise to manipulated data
and normalize data
69. Page 6928.02.2014
Patrick Michl
Network Modeling
Results
We train an autoencoder with 9 hidden layers
and 165 nodes:
Layer 1 & 9: 32 hidden units
Layer 2 & 8: 24 hidden units
Layer 3 & 7: 16 hidden units
Layer 4 & 6: 8 hidden units
Layer 5: 5 hidden units
input data X
output data X‘
74. Page 7428.02.2014
Patrick Michl
Network Modeling
Conclusion
Conclusion
• Autoencoders can improve modeling significantly by reducing the
dimensionality of data
• Autoencoders preserve complex structures in their multilayer
perceptron network. Analysing those networks (for example with
knockout tests) could give more structural information
• The drawback are high computational costs
Since the field of deep learning is getting more popular (Face
recognition / Voice recognition, Image transformation). Many new
improvements in facing the computational costs have been made.