ICML 2016: The Information Sieve

•Download as PPT, PDF•

4 likes•4,032 views

My talk from ICML 2016 describing the "information sieve", a principle for decomposing information that enables a new approach for unsupervised representation learning.

Science

The Information Sieve
Greg Ver Steeg and Aram Galstyan
Soup = data
“Main
ingredient”
extracted at
each layer

Factorial code
• Carry recipe instead of soup
• Missing ingredients?
• Make more soup
• Compression
• Prediction
• Generative model
Recipe
-Ingredient 1
-Ingredient 2
-…
Invertible transform that makes
components independent
Finding such a transform is a generally intractable problem.
We use a sequence that incrementally removes dependence

Two Steps
1.Find the most informative function of the
input data
2. Transform the data to remove the
information in Yk, and then repeat
InputInputInput
Remainder
Soup
Main
ingredient

The main ingredient:
multivariate information
• Multivariate mutual information, or Total Correlation (Watanabe, 1960)
• TC(X|Y) = 0 if and only if Y “explains” all the dependence in X
• So we search for Y that minimizes TC(X|Y)
• Equivalently, we define the total correlation explained by Y as:

The main ingredient:
Total Correlation Explanation (CorEx)
• Optimize over all probabilistic functions
• Solution has special form that makes it tractable
• Computational complexity is linear in the number of variables

Sift out the main ingredient: remainder
info
The remainder is a transformation of the inputs with 2 properties:
Input
Remainder
Soup
Remainder contains no info about Y
Transformation is invertible

Iterative sifting as:
Multivariate
mutual
information
in data (Total
Correlation)
Contribution
from each layer
of the sieve
(optimized)
Remainder
(at layer r)
Decomposition of information

Iterative sifting as:
Dependence at each layer of the sieve
decreases until we get to zero, i.e. complete
independence
Dependence
(at layer r)
Extracting dependence

Recover spatial clusters from fMRI data
Ground truth ICA Sieve
Example of recovering spatial clusters in
brain data from temporal activation patterns

Lossy compression and in-painting
• Sieve representation with 12 layers/bits/binary latent factors on
MNIST digits
We can use the sieve for standard prediction and
generative model tasks

Lossless compression (on MNIST)
• Same size codebooks for Random and Sieve-based codes
• (gzip is sequence-based, shown for reference)
Proof of principle for lossless compression; though specialized
compression techniques are better on MNIST.
Method Naive gzip Random
codebook
Sieve
codebook
Bits per digit 784 328 267 243

Conclusion
• Incrementally decomposing multivariate
information is useful, practical, and delicious
• Could improve with joint optimization and better
transformations for remainder info
Link to all papers and code
http://bit.ly/corex_info
Contact: gregv@isi.edu, galstyan@isi.edu
• The extension to continuous random variables is nontrivial but more
practical and demonstrates connections to “common information”:
“Sifting Common Information from Many Variables”, arXiv:1606.02307.

Similar to ICML 2016: The Information Sieve

Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...NoSQLmatters

Dimensionality Reduction and feature extraction.pptxSivam Chinna

08 neural networksankit_ppt

Data streaming algorithmsSandeep Joshi

1015 track2 abbottRising Media, Inc.

1030 track2 abbottRising Media, Inc.

Online learning, Vowpal Wabbit and HadoopHéloïse Nonne

Realtime AnalyticseXascale Infolab

Probabilistic data structures. Part 2. CardinalityAndrii Gakhov

Firefly exact MCMC for Big DataGianvito Siciliano

Terascale Learningpauldix

data clean.pptchatbot9

Nonlinear dimension reductionYan Xu

2013 open analytics_countingv3abramsm

Class9_PCA_final.pptMaTruongThanh002937

Graph Analysis Beyond Linear AlgebraJason Riedy

"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...Dataconomy Media

Uwe Friedrichsen - CRDT und mehr - über extreme Verfügbarkeit und selbstheile...AboutYouGmbH

Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...Florent Renucci

Self healing dataUwe Friedrichsen

Similar to ICML 2016: The Information Sieve (20)

Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...

Dimensionality Reduction and feature extraction.pptx

08 neural networks

Data streaming algorithms

1015 track2 abbott

1030 track2 abbott

Online learning, Vowpal Wabbit and Hadoop

Realtime Analytics

Probabilistic data structures. Part 2. Cardinality

Firefly exact MCMC for Big Data

Terascale Learning

data clean.ppt

Nonlinear dimension reduction

2013 open analytics_countingv3

Class9_PCA_final.ppt

Graph Analysis Beyond Linear Algebra

"Quantum clustering - physics inspired clustering algorithm", Sigalit Bechler...

Uwe Friedrichsen - CRDT und mehr - über extreme Verfügbarkeit und selbstheile...

Manifold Blurring Mean Shift algorithms for manifold denoising, presentation,...

Self healing data

Recently uploaded

Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls

Cultivation of KODO MILLET . made by Ghanshyam pptxpradhanghanshyam7136

Animal Communication- Auditory and Visual.pptxUmerFayaz5

NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfWadeK3

Boyles law module in the grade 10 sciencefloriejanemacaya1

Behavioral Disorder: Schizophrenia & it's Case Study.pdfSELF-EXPLANATORY

Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823

Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25

Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha

zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani

Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani

Natural Polymer Based NanomaterialsAArockiyaNisha

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani

Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk

G9 Science Q4- Week 1-2 Projectile Motion.pptMAESTRELLAMesa2

Recently uploaded (20)

Artificial Intelligence In Microbiology by Dr. Prince C P

Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR

Cultivation of KODO MILLET . made by Ghanshyam pptx

Animal Communication- Auditory and Visual.pptx

NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf

Boyles law module in the grade 10 science

Behavioral Disorder: Schizophrenia & it's Case Study.pdf

Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.

Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...

Recombination DNA Technology (Nucleic Acid Hybridization )

Physiochemical properties of nanomaterials and its nanotoxicity.pptx

zoogeography of pakistan.pptx fauna of Pakistan

Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b

Bentham & Hooker's Classification. along with the merits and demerits of the ...

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...

Natural Polymer Based Nanomaterials

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...

All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...

Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx

G9 Science Q4- Week 1-2 Projectile Motion.ppt

ICML 2016: The Information Sieve

1. The Information Sieve Greg Ver Steeg and Aram Galstyan Soup = data “Main ingredient” extracted at each layer

2. Factorial code • Carry recipe instead of soup • Missing ingredients? • Make more soup • Compression • Prediction • Generative model Recipe -Ingredient 1 -Ingredient 2 -… Invertible transform that makes components independent Finding such a transform is a generally intractable problem. We use a sequence that incrementally removes dependence

3. Two Steps 1.Find the most informative function of the input data 2. Transform the data to remove the information in Yk, and then repeat InputInputInput Remainder Soup Main ingredient

4. The main ingredient: multivariate information • Multivariate mutual information, or Total Correlation (Watanabe, 1960) • TC(X|Y) = 0 if and only if Y “explains” all the dependence in X • So we search for Y that minimizes TC(X|Y) • Equivalently, we define the total correlation explained by Y as:

5. The main ingredient: Total Correlation Explanation (CorEx) • Optimize over all probabilistic functions • Solution has special form that makes it tractable • Computational complexity is linear in the number of variables

6. Sift out the main ingredient: remainder info The remainder is a transformation of the inputs with 2 properties: Input Remainder Soup Remainder contains no info about Y Transformation is invertible

7. Iterative sifting as: Multivariate mutual information in data (Total Correlation) Contribution from each layer of the sieve (optimized) Remainder (at layer r) Decomposition of information

8. Iterative sifting as: Dependence at each layer of the sieve decreases until we get to zero, i.e. complete independence Dependence (at layer r) Extracting dependence

9. Recover spatial clusters from fMRI data Ground truth ICA Sieve Example of recovering spatial clusters in brain data from temporal activation patterns

10. Lossy compression and in-painting • Sieve representation with 12 layers/bits/binary latent factors on MNIST digits We can use the sieve for standard prediction and generative model tasks

11. Lossless compression (on MNIST) • Same size codebooks for Random and Sieve-based codes • (gzip is sequence-based, shown for reference) Proof of principle for lossless compression; though specialized compression techniques are better on MNIST. Method Naive gzip Random codebook Sieve codebook Bits per digit 784 328 267 243

12. Conclusion • Incrementally decomposing multivariate information is useful, practical, and delicious • Could improve with joint optimization and better transformations for remainder info Link to all papers and code http://bit.ly/corex_info Contact: gregv@isi.edu, galstyan@isi.edu • The extension to continuous random variables is nontrivial but more practical and demonstrates connections to “common information”: “Sifting Common Information from Many Variables”, arXiv:1606.02307.

Editor's Notes

I have a cartoon version of the talk…[describe]...that’s like 90% of it. I’m going to stick with the soup metaphor: All that remains is to say what we mean by “main ingredient”, and what does it mean to “remove” it. Before that, though, why would you want to do this?
Filtering out all the ingredients in soup is really a way to reverse engineer the recipe. The technical equivalent of this is called a factorial code; decompose data into independent components. There are many advantages… Unfortunately, this isn’t very easy. Our sieves provide us a way to easily do this in an incremental way so that our representation is more independent at each step. Let’s abstract a bit...
At every layer of this sieve, we have discrete random variables with iid samples drawn from an unknown distribution. Step 1 finds the “main ingredient” by solving Step 2 filters it out
Why the need for a qualification? It seems to me that information by itself is somewhat useless for learning. A bit of noise and a bit of signal are not really distinguishable. High-d is only difficult if there are nontrivial relationships, so that’s what we need to characterize (CAREFUL not to ramble here…)
In soup terms, we have two criteria: The ingredient is completely extracted. If not, we might end up sifting out some carrots at layer 1 and more at layer 3. We can invert the transformation. We just throw the carrots back in and we are right where we started. WHY do we define remainder in this way exactly? The next two slides will show why that’s a powerful way to go.
Defining the main ingredient as multivariate information and correctly defining the remainder information leads finally to some very nice expressions. Ok, so now we have a way to progressively extract the most important ingredients in our soup. We mentioned the benefits at the beginning, and we still get almost all of those benefits from doing it progressively. In fact, in a way we are better off because our list of ingredients is ranked by importance. PUT IN PLOT?
Defining the main ingredient as multivariate information and correctly defining the remainder information leads finally to some very nice expressions. Ok, so now we have a way to progressively extract the most important ingredients in our soup. We mentioned the benefits at the beginning, and we still get almost all of those benefits from doing it progressively. In fact, in a way we are better off because our list of ingredients is ranked by importance. PUT IN PLOT?
Synthetic data, so we know the ground truth. Plotting the weights, note that this is a linear version that is described in a different paper

ICML 2016: The Information Sieve

Recommended

Recommended

More Related Content

Similar to ICML 2016: The Information Sieve

Similar to ICML 2016: The Information Sieve (20)

Recently uploaded

Recently uploaded (20)

ICML 2016: The Information Sieve

Editor's Notes