2. Overview
• Intro. of Rate-Distortion Theory
- Elements of Information Theory, Thomas M. Cover, Joy A. Thomas (1991)
• Bridge of Rate-Distortion theory and VAE
- Fixing a Broken ELBO (Alemi+ ICML 2018)
• Rate-Distortion theory and Memory distortion in humans (for coffee break)
- An experimental study of the effect of language on the reproduction of visually
perceived forms (Carmichael+ 1932)
- Semantic Compression of Episodic Memories (Nagy+ 2018)
• Recent research
- Exact Rate-Distortion in Autoencoders via Echo Noise (Brekelmans+ NIPS 2019)
3. Rate-Distortion Theory
• This theory can state as follows:
- Given a source distribution and a distortion measure,
what is the minimum expected distortion achievable at a particular rate?
- Or, what is the minimum rate required to achieve a particular distortion?
R(D) = min
p( ̂x|x):p(x)p( ̂x|x)d(x, ̂x)≤D
I(X, ̂X)
X →
Y=f(X)
→ ̂XEncoder Decoder
• Rate-Distortion theory:
- created by Claude Shannon in 1948 in famous paper:
“A Mathematical Theory of Communication”
- We’d like to care about the mutual information MI(X, X’) and distortion D(X, X’)
- Rate: the number of bits per data sample to be stored or transmitted
- Distortion: the difference between input and output signal
Elements of Informa.on Theory, Thomas M. Cover, Joy A. Thomas
4. Examples of Rate Distortion curves
a N(0, σ2) source with squared error distortion
R(D) =
{
1
2
log
σ2
D
, 0 ≤ D ≤ σ2
0 σ2
≤ D
Test channel
̂X ∼ N(0,σ2
− D)
Z ∼ N(0,D)
X ∼ N(0,σ2
)
a Bernoulli(p) source with Hamming distortion
R(D) =
{
H(p) − H(D), 0 ≤ D ≤ min{p,1 − p}
0 D > min{p,1 − p}
Test channel
1 − D
1 − D
D
D
̂X X
1 − p
r p
1 − r
Elements of Informa.on Theory, Thomas M. Cover, Joy A. Thomas
5. Fixing a Broken ELBO
Alexander A. Alemi, Ben Poole, Ian Fischer, Joshua V. Dillon, Rif A. Saurous, Kevin Murphy, ICML 2018
Alemi, A., Poole, B., Fischer, I., Dillon, J., Saurous, R., Murphy, K. (2017). Fixing a Broken ELBOhttps://arxiv.org/abs/1711.00464
β-VAE Loss function
Testing RD-curve in several models (e,d,m):
: simple or complex encoder
: simple or complex decoder
: simple, complex, or Vamp
e ∈ { − , + }
d ∈ { − , + }
m ∈ { − , + ,v}
Definition
6. An experimental study of the effect of language on the reproduction of visually perceived forms
• History:
✓ G. E. Müller, Zur Analyse der Gedächtnistätigkeit und des Vorstellungsverlaufes (1913)
(For analysis of memory activity and the Mousse of ideas, translated by google)
✴ The reproduction changes after the passage of time
✓ F. Wulf, Über die Veränderung von Vorstellungen (Gedächtnis und Gestalt)
in (1921) (changing ideas, translated by google)
✴ The identification involves the linguistic naming of the objects
• Carmichael et al. reported the reproduction of visually perceived form by the use of
language
• Experiment: At first, the participants (60 subjects) received the stimulus figures with word list 1
or 2. assigned. Then, they reproduce the visual forms on each category.
L. Carmichael, H. P. Hogan, A. A. Walter (1932)
Carmichael, L., Hogan, H., Walter, A. (1932). An experimental study of the effect of language on the reproduction of visually perceived form. Journal of Experimental Psychology 15(1), 73. https://
dx.doi.org/10.1037/h0072671
7. An experimental study of the effect of language on the reproduction of visually perceived forms
• Result: Linguistic label distorts the memory and visual forms
L. Carmichael, H. P. Hogan, A. A. Walter (1932)
Carmichael, L., Hogan, H., Walter, A. (1932). An experimental study of the effect of language on the reproduction of visually perceived form. Journal of Experimental Psychology 15(1), 73. https://
dx.doi.org/10.1037/h0072671
8. Semantic Compression of Episodic Memories
• The relevance of RDT for explaining errors and biases in human visual working memory
• Result below shows the reconstruction at different rates (changed value of β)
David G. Nagy, Balazs Török, Gergõ Orbán (2018)
Nagy, D., Török, B., Orbán, G. (2018). Semantic Compression of Episodic Memorieshttps://arxiv.org/abs/1806.07990
9. • Idea:
✓ Let’s determine the noise in a data-driven fashion which doesn’t require restrictive prior
distributional assumptions
• Result:
✓ the model provided the exact expression for mutual information and outperformed flow-
based method without the need to train additional distributional transformations
• Losses:
✓ VAE Loss:
✓
Echo noise:
• Echo noise properties:
1. z has the same distribution as
2. The mutual information
is as follows:
q(z|x) = μ(x) + σ(x)ϵ, ϵ ∼ N(0,I)
q(z|x) = μ(x) + σ(x)ϵ(x), ϵ =
∞
∑
l=0
(Πl
l′=1σ(xl′
))μ(xl
), xl
∼ q(x)
ϵ
I(X; Z) = − E log|detσ(x)|
Exact Rate-Distortion in Autoencoders via Echo Noise
Rob Brekelmans, Daniel Moyer, Aram Galstyan, Greg Ver Steeg, NIPS 2019
Brekelmans, R., Moyer, D., Galstyan, A., Steeg, G. (2019). Exact Rate-Distortion in Autoencoders via Echo Noisehttps://arxiv.org/abs/1904.07199
Illustration of setting μ(x) = x, σ(x) = 0.5
10. • How do we derive the echo noise?
✓ We’d like to choose the noise to enforce an equivalence between the distributions
q(z) and q(ε) for mutual information computation
✓ ,
✓ Let’s make the noise match the channel output:
✓
✓ We can guarantee that the noise and marginal distributions match in the limit
• Lossy Compression in VAEs
✓ ELBO
✓ KL term
• Rate-Distortion Objective
q(z) =
∫
qϕ(z|x)q(x)dx qϕ(z|x) = μ(x) + σ(x)ϵ
ϵ = μ(x′), x′ ∼ q(x)
ϵ = μ(x0
) + σ(x0
)(μ(x1
) + σ(x1
)(μ(x2
) + σ(x2
) . . .
Exact Rate-Distortion in Autoencoders via Echo Noise
Rob Brekelmans, Daniel Moyer, Aram Galstyan, Greg Ver Steeg, NIPS 2019
Brekelmans, R., Moyer, D., Galstyan, A., Steeg, G. (2019). Exact Rate-Distortion in Autoencoders via Echo Noisehttps://arxiv.org/abs/1904.07199
11. • At low rates, Echo maintain only high level features of the input image
• At high rates, Echo gives the better reconstruction error than Info VA
Exact Rate-Distortion in Autoencoders via Echo Noise
Rob Brekelmans, Daniel Moyer, Aram Galstyan, Greg Ver Steeg, NIPS 2019
Brekelmans, R., Moyer, D., Galstyan, A., Steeg, G. (2019). Exact Rate-Distortion in Autoencoders via Echo Noisehttps://arxiv.org/abs/1904.07199