Sensory Coding &   Hierarchical Representations        Michael S. Lewicki   Computer Science Department &Center for the Ne...
“IT”                                                                         V4                                           ...
Fig. 1. Anatomical and functional hierarchies in the macaque visual system. The human Hedgé and Felleman, 2007            ...
Palmer and Rock, 1994NIPS 2007 Tutorial   4     Michael S. Lewicki ! Carnegie Mellon
V1 simple cells                                              !"#"$%&"()&"*+(,)(-(./(0&1$*"(#"                             ...
NIPS 2007 Tutorial   6   Michael S. Lewicki ! Carnegie Mellon
NIPS 2007 Tutorial   7   Michael S. Lewicki ! Carnegie Mellon
Out of the retina    •    > 23 distinct neural pathways, no simple function         division    •    Suprachiasmatic nucle...
V1 simple cell integration of LGN cells  !"#"$%&"(()*+,-+)".*)(/"*"&0.1/("2(//&                     !"#$%&(")&*           ...
Hyper-column model                          Hubel and Wiesel, 1978NIPS 2007 Tutorial   10                     Michael S. L...
Anatomical circuitry in V1        54        CALLAWAY          Figure 1 Contributions of individual neurons to local excita...
Where is this headed?                 Ramon y CajalNIPS 2007 Tutorial               12   Michael S. Lewicki ! Carnegie Mel...
A wing would be a most mystifying structureif one did not know that birds flew.                            Horace Barlow, 1...
A theoretical approach    •    Look at the system from a functional perspective:         What problems does it need to sol...
Representing structure in natural images                      image from Field (1994)NIPS 2007 Tutorial                   ...
Representing structure in natural images                      image from Field (1994)NIPS 2007 Tutorial                   ...
Representing structure in natural images                      image from Field (1994)NIPS 2007 Tutorial                   ...
Representing structure in natural images                      Need to describe all image structure in the scene.          ...
V1 simple cells                                              !"#"$%&"()&"*+(,)(-(./(0&1$*"(#"                             ...
Oriented Gabor models of individual simple cells                             figure from Daugman, 1990; data from Jones and...
2D Gabor wavelet captures spatial structure                                     •   2D Gabor functions                    ...
Recoding with Gabor functions                     Pixel entropy = 7.57 bits        Recoding with 2D Gabor functions       ...
How is the V1 population organized?NIPS 2007 Tutorial                    23               Michael S. Lewicki ! Carnegie Me...
A general approach to coding: redundancy reduction                                                              Correlatio...
Describing signals with a simple statistical model    Principle         Good codes capture the statistical distribution of...
An algorithm for deriving efficient linear codes: ICA    Learning objective:                                  Using indepen...
Modeling Non-Gaussian distributions    •    Typical coeff. distributions of natural         signals are non-Gaussian.     ...
The generalized Gaussian distribution                                                       1 q                           ...
Modeling Gaussian distributions with PCA    •                                                       !=0                   ...
Modeling non-Gaussian distributions    •                                                      !=2                         ...
Modeling non-Gaussian distributions    •                                                      !=2                         ...
Efficient coding of natural images: Olshausen and Field, 1996                     naturalscene                     nature s...
Model predicts local and global receptive field properties         Learned basis for natural images                 Overlai...
Algorithm selects best of many possible sensory codes                                                                     ...
Comparing coding efficiency on natural images                                                        Comparing codes on nat...
Responses in primary visual cortex to visual motion                            from Wandell, 1995NIPS 2007 Tutorial       ...
Sparse coding of time-varying images (Olshausen, 2002)                         "#$!%                                      ...
Sparse decomposition of image sequences                               convolution                                         ...
Learned spatio-temporal basis functions               timebasisfunction                           from Olshausen, 2002NIPS...
Animated spatial-temporal basis functions                                                      from Olshausen, 2002NIPS 20...
$   explains data from principles               Theory                         $   requires idealization and abstraction  ...
How general is the efficient coding principle?                           Can it explain auditory coding?NIPS 2007 Tutorial ...
Limitations of the linear model    •    linear    •    only optimal for block, not whole signal    •    no phase-locking  ...
A continuous filterbank does not form an efficient code            ⊗                 ←                              →       ...
Efficient signal representation using time-shiftable kernels (spikes)Smith and Lewicki (2005) Neural Comp. 17:19-45        ...
Coding audio signals with spikes                 5000Kernel CF (Hz)                 2000                 1000K            ...
Kernel CF (Hz)                        2000Comparing a spike code to a spectrogram   1000                         500      ...
“can” with filter-threshold                 5000Kernel CF (Hz)                 2000                 1000                 50...
Spike Coding with Matching Pursuit    1. convolve signal with kernelsNIPS 2007 Tutorial                    49   Michael S....
Spike Coding with Matching Pursuit    1. convolve signal with kernels    2. find largest peak over convolution setNIPS 2007...
Spike Coding with Matching Pursuit    1. convolve signal with kernels    2. find largest peak over convolution set    3. fit...
Spike Coding with Matching Pursuit    1. convolve signal with kernels    2. find largest peak over convolution set    3. fit...
Spike Coding with Matching Pursuit    1. convolve signal with kernels    2. find largest peak over convolution set    3. fit...
Spike Coding with Matching Pursuit    1. convolve signal with kernels    2. find largest peak over convolution set    3. fit...
Spike Coding with Matching Pursuit    1. convolve signal with kernels    2. find largest peak over convolution set    3. fit...
Spike Coding with Matching Pursuit    1. convolve signal with kernels    2. find largest peak over convolution set    3. fit...
Spike Coding with Matching Pursuit    1. convolve signal with kernels    2. find largest peak over convolution set    3. fit...
Spike Coding with Matching Pursuit    1. convolve signal with kernels    2. find largest peak over convolution set    3. fit...
“can” 5 dB SNR, 36 spikes, 145 sp/sec                      5000     Kernel CF (Hz)                      2000              ...
“can” 10 dB SNR, 93 spikes, 379 sp/sec                      5000     Kernel CF (Hz)                      2000             ...
“can” 20 dB SNR, 391 spikes, 1700 sp/sec                      5000     Kernel CF (Hz)                      2000           ...
“can” 40 dB SNR, 1285 spikes, 5238 sp/sec                      5000     Kernel CF (Hz)                      2000          ...
Efficient auditory coding with optimized kernel shapesSmith and Lewicki (2006) Nature 439:978-982                          ...
Optimizing the probabilistic model                                       M    nm                              x(t) =      ...
Adapting the optimal kernel shapesNIPS 2007 Tutorial              65   Michael S. Lewicki ! Carnegie Mellon
Kernel functions optimized for coding speechNIPS 2007 Tutorial              66             Michael S. Lewicki ! Carnegie M...
Quantifying coding efficiency                             5000    1. fit signal            Kernel CF (Hz)                   ...
Coding efficiency curves                   40                   35                   30                   25        SNR (dB...
Using efficient coding theoryComparing atheoretical a spectrogram                             to make spike code to predict...
Learned kernels share features of auditory nerve filters                          Auditory nerve filters              Optimi...
Learned kernels closely match individual auditory nerve filters                     for each kernel find closet matching aud...
Learned kernels overlaid on selected auditory nerve filters                     For almost all learned kernels there is a c...
Coding of a speech consonanta                       Input                        Reconstruction                        Res...
How is this achieving an efficient, time-relative code?                                 Time-relative coding of glottal pul...
$   explains data from principles               Theory                         $   requires idealization and abstraction  ...
A gap in the theory?                                                           -                                          ...
Redundancy reduction for noisy channels (Atick, 1992)           s          input          x        recoding            Ax ...
Profiles of optimal filters                                             •   high SNR                                   -    ...
An observation: Contrast sensitivity of ganglion cells                     Luminance level decreases one log unit each tim...
Natural images have a 1/f amplitude spectrum                                                       Field, 1987NIPS 2007 Tu...
Components of predicted filters                                               whitening filter                              ...
Predicted contrast sensitivity functions match neural data                                                             fro...
Robust coding of natural images    •    Theory refined:          -    image is noisy and blurred          -    neural popul...
Problem 1: Real neurons are “noisy”                                  Estimates of neural information capacity      system ...
Traditional codes are not robust                       encoding neurons                           sensory input           ...
Traditional codes are not robust                       encoding neurons                                                   ...
al., to pairwise correla- al.,  st 2001; Schneidman ete correlations that can   not just to pairwise correla-        Redun...
tance, the code. great numbersee, coding elementsintroduces redundancy into the cod  ty of what if a As we will of robust ...
Generalizing the model: sensory noise and optical blur(a) Undistorted image       (b) Fovea                               ...
REVIEWSRobust coding is distinct from “reading” noisy populations                         a                               ...
∆W ∝ −        (Cost) = 2 A (IN − AW)Σx − λ    diag   (Cost) =               (Error) + λ (Var.∂W                           ...
Robust coding of natural images                       encoding neurons                                                    ...
Robust coding of natural images                       encoding neurons                                                    ...
Reconstruction improves by adding neurons     encoding neurons                                                            ...
Adding precision to 1x robust coding                     original images   1 bit: 12.5% error   2 bit: 3.1% errorNIPS 2007...
1        M ·2 (1, ·√ ,1 T ∈ RM . It takes a convenient fo                                  where 1M SNR + 1· · 1)         ...
Sparseness localizes the vectors and increases coding efficiency                                                           ...
Non-zero resource constraints localize weights                     (a) 20dB      10dB             0dB           -10dB fove...
$   explains data from principles               Theory                         $   requires idealization and abstraction  ...
What happens next?                                                                           -                            ...
What happens next?NIPS 2007 Tutorial   101   Michael S. Lewicki ! Carnegie Mellon
of filters with strongly    matic gain control known as ‘divisive normalization’ that has beenis larger; for a pair that  ...
Using sensory gain control to reduce redundancySchwartz and Simoncelli (2001)                          1                  ...
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
NIPS2007: sensory coding and hierarchical representations
Upcoming SlideShare
Loading in...5
×

NIPS2007: sensory coding and hierarchical representations

416

Published on

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
416
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

NIPS2007: sensory coding and hierarchical representations

  1. 1. Sensory Coding & Hierarchical Representations Michael S. Lewicki Computer Science Department &Center for the Neural Basis of Cognition Carnegie Mellon University
  2. 2. “IT” V4 V2 retina LGN V1NIPS 2007 Tutorial 2 Michael S. Lewicki ! Carnegie Mellon
  3. 3. Fig. 1. Anatomical and functional hierarchies in the macaque visual system. The human Hedgé and Felleman, 2007 from visual system (not shown) isbelieved to be roughly similar. A, A schematic summary of the laminar patterns of feed-forward (or ascending) and feed-back (or descending) connections for visual area V1. The laminar patterns vary somewhat from one visual area to the next.But in general, the connections are complementary, so that the ascending connections terminate in the granular layer (layer4) and the descending connections avoid it. The connections are generally reciprocal, in that an area that sends feed-forward connections to another area also receives feedback connections from it. The visual anatomical hierarchy is definedbased on, among other things, the laminar patterns of these interconnections among the various areas. See text for details.B, A simplified version of the visual anatomical hierarchy in the macaque monkey. For the complete version, see Fellemanand Van Essen (1991). See text for additional details. AIT = anterior inferotemporal; LGN = lateral geniculate nucleus;LIP = lateral intraparietal; MT = middle temporal; MST = medial superior temporal; PIT = posterior inferotemporal; V1 =visual area 1; V2 = visual area 2; V4 = visual area 4; VIP = ventral intraparietal. C, A model of hierarchical processing ofvisual information proposed by David Marr (1982). D, A schematic illustration of the presumed parallels between theanatomical and functional hierarchies. It is widely presumed 3 only that visual processingMichael S. Lewicki ! Carnegie Mellon the NIPS 2007 Tutorial not is hierarchical but also that
  4. 4. Palmer and Rock, 1994NIPS 2007 Tutorial 4 Michael S. Lewicki ! Carnegie Mellon
  5. 5. V1 simple cells !"#"$%&"()&"*+(,)(-(./(0&1$*"(#" 2"345"*&06 789-: Hubel and Wiesel, 1959 DeAngelis, et al, 1995NIPS 2007 Tutorial 5 Michael S. Lewicki ! Carnegie Mellon
  6. 6. NIPS 2007 Tutorial 6 Michael S. Lewicki ! Carnegie Mellon
  7. 7. NIPS 2007 Tutorial 7 Michael S. Lewicki ! Carnegie Mellon
  8. 8. Out of the retina • > 23 distinct neural pathways, no simple function division • Suprachiasmatic nucleus: circadian rhythm • Accessory optic system: stabilize retinal image • Superior colliculus: integrate visual and auditory information with head movements, direct eyes • Pretectum: plays adjusting pupil size, track large moving objects • Pregeniculate: cells responsive to ambient light • Lateral geniculate (LGN): main “relay” to visual cortex; contains 6 distinct layers, each with 2 sublayers. Organization very complexNIPS 2007 Tutorial 8 Michael S. Lewicki ! Carnegie Mellon
  9. 9. V1 simple cell integration of LGN cells !"#"$%&"(()*+,-+)".*)(/"*"&0.1/("2(//& !"#$%&(")&* +,%"- 345 2(//& 60.1/( 2(// Hubel and Wiesel, 1963 !78(/"#"$0(&(/9":;<=NIPS 2007 Tutorial 9 Michael S. Lewicki ! Carnegie Mellon
  10. 10. Hyper-column model Hubel and Wiesel, 1978NIPS 2007 Tutorial 10 Michael S. Lewicki ! Carnegie Mellon
  11. 11. Anatomical circuitry in V1 54 CALLAWAY Figure 1 Contributions of individual neurons to local excitatory connections between cortical layers. From left to right, typical spiny from Callaway, 1998 4C, 2–4B, 5, and 6 are shown. Dendritic neurons in layers arbors are illustrated with thick lines and axonal arbors with finer lines. Each cell projects axons specifically to only a subset of the layers. This simplified diagram focuses on connections between layers 2–4B, 4C, 5, and 6 and does not incorporate details of circuitry specific for subdivisions ofNIPS 2007 Tutorial Michael S. Lewicki ! Carnegie these layers. A model of the interactions between these neurons is shown in Figure 2. The neurons Mellon 11
  12. 12. Where is this headed? Ramon y CajalNIPS 2007 Tutorial 12 Michael S. Lewicki ! Carnegie Mellon
  13. 13. A wing would be a most mystifying structureif one did not know that birds flew. Horace Barlow, 1961An algorithm is likely to be understood morereadily by understanding the nature of theproblem being solved than by examining themechanism in which it is embodied. David Marr, 1982
  14. 14. A theoretical approach • Look at the system from a functional perspective: What problems does it need to solve? • Abstract from the details: Make predictions from theoretical principles. • Models are bottom-up; theories are top-down. What are the relevant computational principles?NIPS 2007 Tutorial 14 Michael S. Lewicki ! Carnegie Mellon
  15. 15. Representing structure in natural images image from Field (1994)NIPS 2007 Tutorial 15 Michael S. Lewicki ! Carnegie Mellon
  16. 16. Representing structure in natural images image from Field (1994)NIPS 2007 Tutorial 16 Michael S. Lewicki ! Carnegie Mellon
  17. 17. Representing structure in natural images image from Field (1994)NIPS 2007 Tutorial 17 Michael S. Lewicki ! Carnegie Mellon
  18. 18. Representing structure in natural images Need to describe all image structure in the scene. What representation is best? image from Field (1994)NIPS 2007 Tutorial 18 Michael S. Lewicki ! Carnegie Mellon
  19. 19. V1 simple cells !"#"$%&"()&"*+(,)(-(./(0&1$*"(#" 2"345"*&06 789-: Hubel and Wiesel, 1959 DeAngelis, et al, 1995NIPS 2007 Tutorial 19 Michael S. Lewicki ! Carnegie Mellon
  20. 20. Oriented Gabor models of individual simple cells figure from Daugman, 1990; data from Jones and Palmer, 1987NIPS 2007 Tutorial 20 Michael S. Lewicki ! Carnegie Mellon
  21. 21. 2D Gabor wavelet captures spatial structure • 2D Gabor functions • Wavelet basis generated by dilations, translations, and rotations of a single basis function • Can also control phase and aspect ratio • (drifting) Gabor functions are what the eye “sees best”NIPS 2007 Tutorial 21 Michael S. Lewicki ! Carnegie Mellon
  22. 22. Recoding with Gabor functions Pixel entropy = 7.57 bits Recoding with 2D Gabor functions Coefficient entropy = 2.55 bitsNIPS 2007 Tutorial 22 Michael S. Lewicki ! Carnegie Mellon
  23. 23. How is the V1 population organized?NIPS 2007 Tutorial 23 Michael S. Lewicki ! Carnegie Mellon
  24. 24. A general approach to coding: redundancy reduction Correlation of adjacent pixels 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 Redundancy reduction is equivalent to efficient coding.NIPS 2007 Tutorial 24 Michael S. Lewicki ! Carnegie Mellon
  25. 25. Describing signals with a simple statistical model Principle Good codes capture the statistical distribution of sensory patterns. How do we describe the distribution? • Goal is to encode the data to desired precision x = a1 s1 + a2 s2 + · · · + aL sL + = As + • Can solve for the coefficients in the no noise case ˆ=A s −1 xNIPS 2007 Tutorial 25 Michael S. Lewicki ! Carnegie Mellon
  26. 26. An algorithm for deriving efficient linear codes: ICA Learning objective: Using independent component analysis (ICA) to optimize A: maximize coding efficiency maximize P(x|A) over A. ∂ ∆A ∝ AA log P (x|A) T ∂A = −A(zsT − I) Probability of the pattern ensemble is: where z = (log P(s))’. P (x1 , x2 , ..., xN |A) = P (xk |A) k This learning rule: To obtain P(x|A) marginalize over s: • learns the features that capture the most structure P (x|A) = ds P (x|A, s)P (s) • optimizes the efficiency of the code P (s) = | det A| What should we use for P(s)?NIPS 2007 Tutorial 26 Michael S. Lewicki ! Carnegie Mellon
  27. 27. Modeling Non-Gaussian distributions • Typical coeff. distributions of natural signals are non-Gaussian. !=2 !=4 !6 !4 !2 0 2 4 6 !6 !4 !2 0 2 4 6 s s 1 2NIPS 2007 Tutorial 27 Michael S. Lewicki ! Carnegie Mellon
  28. 28. The generalized Gaussian distribution 1 q P (x|q) ∝ exp(− |x| ) 2 • Or equivalently, and exponential power distribution (Box and Tiao, 1973): 2/(1+β) ω(β) x−µ P (x|µ, σ, β) = exp −c(β) σ σ • " varies monotonically with the kurtosis, #2: !=!0.75 " =!1.08 !=!0.25 " =!0.45 !=+0.00 (Normal) " =+0.00 2 2 2 !=+0.50 (ICA tanh) "2=+1.21 !=+1.00 (Laplacian) "2=+3.00 !=+2.00 "2=+9.26NIPS 2007 Tutorial 28 Michael S. Lewicki ! Carnegie Mellon
  29. 29. Modeling Gaussian distributions with PCA • !=0 !=0 Principal component analysis (PCA) describes the principal axes of variation in the data distribution. • This is equivalent to fitting the data with a multivariate Gaussian. !6 !4 !2 0 2 4 6 !6 !4 !2 0 2 4 6 s1 s2NIPS 2007 Tutorial 29 Michael S. Lewicki ! Carnegie Mellon
  30. 30. Modeling non-Gaussian distributions • !=2 !=4 What about non-Gaussian marginals? !6 !4 !2 0 2 4 6 !6 !4 !2 0 2 4 6 s1 s2 • How would this distribution be modeled by PCA?NIPS 2007 Tutorial 30 Michael S. Lewicki ! Carnegie Mellon
  31. 31. Modeling non-Gaussian distributions • !=2 !=4 What about non-Gaussian marginals? !6 !4 !2 0 2 4 6 !6 !4 !2 0 2 4 6 s1 s2 • How would this distribution be modeled by PCA? • How should the distribution be described? The non-orthogonal ICA solution captures the non- Gaussian structureNIPS 2007 Tutorial 31 Michael S. Lewicki ! Carnegie Mellon
  32. 32. Efficient coding of natural images: Olshausen and Field, 1996 naturalscene nature scene visual input visual input units image basis functions receptive fields ... ... ... before after learning learningNetwork weights are adapted to maximize coding efficiency:minimizes redundancy and maximizes the independence of the outputsNIPS 2007 Tutorial 32 Michael S. Lewicki ! Carnegie Mellon
  33. 33. Model predicts local and global receptive field properties Learned basis for natural images Overlaid basis function properties from Lewicki and Olshausen, 1999NIPS 2007 Tutorial 33 Michael S. Lewicki ! Carnegie Mellon
  34. 34. Algorithm selects best of many possible sensory codes Haar Learned Wavelet Gabor Fourier PCA Theoretical perspective: Not edge “detectors.” from Lewicki and Olshausen, 1999 An efficient code for natural images.NIPS 2007 Tutorial 34 Michael S. Lewicki ! Carnegie Mellon
  35. 35. Comparing coding efficiency on natural images Comparing codes on natural images 8 7 6 estimated bits per pixel 5 4 3 2 1 0 Gabor wavelet Haar Fourier PCA learnedNIPS 2007 Tutorial 35 Michael S. Lewicki ! Carnegie Mellon
  36. 36. Responses in primary visual cortex to visual motion from Wandell, 1995NIPS 2007 Tutorial 36 Michael S. Lewicki ! Carnegie Mellon
  37. 37. Sparse coding of time-varying images (Olshausen, 2002) "#$!% !( ! "#$&))!!!(% & ! *$&))!% ... & ! I(x, y, t) = ai (t )φi (x, y, t − t ) + (x, y, t) i t = ai (t) ∗ φi (x, y, t) + (x, y, t) iNIPS 2007 Tutorial 37 Michael S. Lewicki ! Carnegie Mellon
  38. 38. Sparse decomposition of image sequences convolution posterior maximum %#$ ! "% "% %#5 "#$ !% !% %#7 " 7% 7% %#! %#$ %#" 5% 5% 34),,3)/& 34),,3)/& $% % $% % !%#$ !%#" 6% 6% !%#! 8% !" 8% !%#7 9% !"#$ 9% !%#5 :% :% !! !%#$ !% 5% 6% !% 5% 6% &()*+,-.()*/0(1)-2 &()*+,-.()*/0(1)-2 reconstruction input sequence from Olshausen, 2002NIPS 2007 Tutorial 38 Michael S. Lewicki ! Carnegie Mellon
  39. 39. Learned spatio-temporal basis functions timebasisfunction from Olshausen, 2002NIPS 2007 Tutorial 39 Michael S. Lewicki ! Carnegie Mellon
  40. 40. Animated spatial-temporal basis functions from Olshausen, 2002NIPS 2007 Tutorial 40 Michael S. Lewicki ! Carnegie Mellon
  41. 41. $ explains data from principles Theory $ requires idealization and abstraction $ code signals accurately and efficiently Principle $ adapted to natural sensory environment Idealization $ cell response is linear Methodology $ information theory, natural images $ explains individual receptive fields Prediction $ explains population organizationNIPS 2007 Tutorial 41 Michael S. Lewicki ! Carnegie Mellon
  42. 42. How general is the efficient coding principle? Can it explain auditory coding?NIPS 2007 Tutorial 42 Michael S. Lewicki ! Carnegie Mellon
  43. 43. Limitations of the linear model • linear • only optimal for block, not whole signal • no phase-locking • representation depends on block alignmentNIPS 2007 Tutorial Michael S. Lewicki ! Carnegie Mellon
  44. 44. A continuous filterbank does not form an efficient code ⊗ ← → → → Goal: find a representation that is both time-relative and efficientNIPS 2007 Tutorial 44 Michael S. Lewicki ! Carnegie Mellon
  45. 45. Efficient signal representation using time-shiftable kernels (spikes)Smith and Lewicki (2005) Neural Comp. 17:19-45 M nm x(t) = sm,i φm (t − τm,i ) + (t) m=1 i=1 • Each spike encodes the precise time and magnitude of an acoustic feature • Figure 3: Smith, NC ms 2956 Two important theoretical abstractions for “spikes” - not binary - not probabilistic • Can convert to a population of stochastic, binary spikesNIPS 2007 Tutorial 45 Michael S. Lewicki ! Carnegie Mellon
  46. 46. Coding audio signals with spikes 5000Kernel CF (Hz) 2000 1000K 500 200 100 0 5 10 15 20 25 ms Input Reconstruction Residual NIPS 2007 Tutorial 46 Michael S. Lewicki ! Carnegie Mellon
  47. 47. Kernel CF (Hz) 2000Comparing a spike code to a spectrogram 1000 500 a 200 100 b c 5000 5000 FrequencyCF (Hz) 2000 Kernel (Hz) 4000 1000 3000 500 2000 200 a 1000 100 c b 100 200 300 400 500 600 700 800 ms 5000 Frequency (Hz) 4000 Kernel CF (Hz) 2000 3000 1000 2000 500 1000 200 100 100 200 300 400 500 600 700 800 ms c How do we compute the spikes?NIPS 2007 Tutorial 5000 47 Michael S. Lewicki ! Carnegie Mellon
  48. 48. “can” with filter-threshold 5000Kernel CF (Hz) 2000 1000 500 200 100 0 5 10 15 20 ms Input Reconstruction ResidualNIPS 2007 Tutorial 48 Michael S. Lewicki ! Carnegie Mellon
  49. 49. Spike Coding with Matching Pursuit 1. convolve signal with kernelsNIPS 2007 Tutorial 49 Michael S. Lewicki ! Carnegie Mellon
  50. 50. Spike Coding with Matching Pursuit 1. convolve signal with kernels 2. find largest peak over convolution setNIPS 2007 Tutorial 50 Michael S. Lewicki ! Carnegie Mellon
  51. 51. Spike Coding with Matching Pursuit 1. convolve signal with kernels 2. find largest peak over convolution set 3. fit signal with kernelNIPS 2007 Tutorial 51 Michael S. Lewicki ! Carnegie Mellon
  52. 52. Spike Coding with Matching Pursuit 1. convolve signal with kernels 2. find largest peak over convolution set 3. fit signal with kernel 4. subtract kernel from signal, record spike, and adjust convolutionsNIPS 2007 Tutorial 52 Michael S. Lewicki ! Carnegie Mellon
  53. 53. Spike Coding with Matching Pursuit 1. convolve signal with kernels 2. find largest peak over convolution set 3. fit signal with kernel 4. subtract kernel from signal, record spike, and adjust convolutions 5. repeatNIPS 2007 Tutorial 53 Michael S. Lewicki ! Carnegie Mellon
  54. 54. Spike Coding with Matching Pursuit 1. convolve signal with kernels 2. find largest peak over convolution set 3. fit signal with kernel 4. subtract kernel from signal, record spike, and adjust convolutions 5. repeatNIPS 2007 Tutorial 54 Michael S. Lewicki ! Carnegie Mellon
  55. 55. Spike Coding with Matching Pursuit 1. convolve signal with kernels 2. find largest peak over convolution set 3. fit signal with kernel 4. subtract kernel from signal, record spike, and adjust convolutions 5. repeatNIPS 2007 Tutorial 55 Michael S. Lewicki ! Carnegie Mellon
  56. 56. Spike Coding with Matching Pursuit 1. convolve signal with kernels 2. find largest peak over convolution set 3. fit signal with kernel 4. subtract kernel from signal, record spike, and adjust convolutions 5. repeatNIPS 2007 Tutorial 56 Michael S. Lewicki ! Carnegie Mellon
  57. 57. Spike Coding with Matching Pursuit 1. convolve signal with kernels 2. find largest peak over convolution set 3. fit signal with kernel 4. subtract kernel from signal, record spike, and adjust convolutions 5. repeat . . .NIPS 2007 Tutorial 57 Michael S. Lewicki ! Carnegie Mellon
  58. 58. Spike Coding with Matching Pursuit 1. convolve signal with kernels 2. find largest peak over convolution set 3. fit signal with kernel 4. subtract kernel from signal, record spike, and adjust convolutions 5. repeat . . . 6. halt when desired fidelity is reachedNIPS 2007 Tutorial 58 Michael S. Lewicki ! Carnegie Mellon
  59. 59. “can” 5 dB SNR, 36 spikes, 145 sp/sec 5000 Kernel CF (Hz) 2000 1000 500 200 100 0 50 100 150 200 ms Input Reconstruction ResidualNIPS 2007 Tutorial 59 Michael S. Lewicki ! Carnegie Mellon
  60. 60. “can” 10 dB SNR, 93 spikes, 379 sp/sec 5000 Kernel CF (Hz) 2000 1000 500 200 100 0 50 100 150 200 ms Input Reconstruction ResidualNIPS 2007 Tutorial 60 Michael S. Lewicki ! Carnegie Mellon
  61. 61. “can” 20 dB SNR, 391 spikes, 1700 sp/sec 5000 Kernel CF (Hz) 2000 1000 500 200 100 0 50 100 150 200 ms Input Reconstruction ResidualNIPS 2007 Tutorial 61 Michael S. Lewicki ! Carnegie Mellon
  62. 62. “can” 40 dB SNR, 1285 spikes, 5238 sp/sec 5000 Kernel CF (Hz) 2000 1000 500 200 100 0 50 100 150 200 ms Input Reconstruction ResidualNIPS 2007 Tutorial 62 Michael S. Lewicki ! Carnegie Mellon
  63. 63. Efficient auditory coding with optimized kernel shapesSmith and Lewicki (2006) Nature 439:978-982 What are the optimal kernel shapes? M nm x(t) = sm,i φm (t − τm,i ) + (t) m=1 i=1 Adapt algorithm of Olshausen (2002)NIPS 2007 Tutorial 63 Michael S. Lewicki ! Carnegie Mellon
  64. 64. Optimizing the probabilistic model M nm x(t) = sm φm(t − τim) + (t), i m=1 i=1 p(x|Φ) = p(x|Φ, s, τ )p(s)p(τ )dsdτ ≈ p(x|Φ, s, τ )p(ˆ)p(ˆ) ˆ ˆ s τ (t) ∼ N (0, σ )Learning (Olshausen, 2002): ∂ ∂ log p(x|Φ) = log p(x|Φ, s, τ ) + log p(ˆ)p(ˆ) ˆ ˆ s τ ∂φm ∂φm M n 1 ∂ m = [x − sm φm(t − τim)]2 ˆi 2σε ∂φm m=1 i=1 1 = [x − x] ˆ ˆi sm σε i Also adapt kernel lengthsNIPS 2007 Tutorial 64 Michael S. Lewicki ! Carnegie Mellon
  65. 65. Adapting the optimal kernel shapesNIPS 2007 Tutorial 65 Michael S. Lewicki ! Carnegie Mellon
  66. 66. Kernel functions optimized for coding speechNIPS 2007 Tutorial 66 Michael S. Lewicki ! Carnegie Mellon
  67. 67. Quantifying coding efficiency 5000 1. fit signal Kernel CF (Hz) 2000 2. quantize time and amplitude values 1000 spike code 3. prune zero values 4. measure coding efficiency using the 500 entropy of quantized values 200 5. reconstruct signal using quantized 100 0 50 150 values 6. measure fidelity using Input signal-to-noise ratio (SNR) of residual error original • identical procedure for other codes (e.g. Fourier and wavelet) Reconstruction reconstruction M nm x(t) = sm,i φm (tResidual) + (t) − τm,i m=1 i=1 residual errorNIPS 2007 Tutorial Michael S. Lewicki ! Carnegie Mellon 67
  68. 68. Coding efficiency curves 40 35 30 25 SNR (dB) +14 dB 20 15 4x more efficient 10 Spike Code: adapted Spike Code: gammatone 5 Block Code: wavelet Block Code: Fourier 0 0 10 20 30 40 50 60 70 80 90 Rate (Kbps)NIPS 2007 Tutorial 68 Michael S. Lewicki ! Carnegie Mellon
  69. 69. Using efficient coding theoryComparing atheoretical a spectrogram to make spike code to predictions a b 5000 Natural Sound Environment Kernel CF (Hz) 2000 evolution theory 1000 500 A simple model of auditory coding 200 physiological data: 100 optimal kernels: How do we compute the spikes? sound • auditory nerve non-linearities waveform filterbank filter shapes static c stochastic spiking population spike code • properties • population trends 5000 • coding efficiency Frequency (Hz) 4000 auditory revcor filters: gammatones 5 3000 Speech Prediction ? Auditory Nerve Filters 1 2 3 4 5 6 7 8 9 10 2000 time (ms) 2 1000 Bandwidth (kHz) 1 More theoretical questions: • Why gammatones? 1 2 3 4 5 time (ms) 0.5 100 200 300 only compare 500 the data after optimizing 400 to 600 700 800 ms • Why spikes? Michael S. Lewicki ! Carnegie Mellon • How is sound coded by we do not fit theBad Zwischenahn ! Aug 21, 2004 data 0.2 1 2 3 4 5 the spike population? time (ms) 0.1 0.1 0.2 0.5 1 2 5 Center Frequency (kHz) How do we develop a theory? Prediction depends on sound ensemble. deBoer and deJongh, 1978 Carney and Yin, 1988 Michael S. Lewicki ! Carnegie Mellon Bad Zwischenahn ! Aug 21, 2004NIPS 2007 Tutorial 69 Michael S. Lewicki ! Carnegie Mellon
  70. 70. Learned kernels share features of auditory nerve filters Auditory nerve filters Optimized kernels from Carney, McDuffy, and Shekhter, 1999 scale bar = 1 msecNIPS 2007 Tutorial 70 Michael S. Lewicki ! Carnegie Mellon
  71. 71. Learned kernels closely match individual auditory nerve filters for each kernel find closet matching auditory nerve filter in Laurel Carney’s database of ~100 filters.NIPS 2007 Tutorial 71 Michael S. Lewicki ! Carnegie Mellon
  72. 72. Learned kernels overlaid on selected auditory nerve filters For almost all learned kernels there is a closely matching auditory nerve filter.NIPS 2007 Tutorial 72 Michael S. Lewicki ! Carnegie Mellon
  73. 73. Coding of a speech consonanta Input Reconstruction Residualb 5000Kernel CF (Hz) 2000 4800 1000 4000 500 3400 200 38 39 40 100c 5000Frequency (Hz) 4000 3000 2000 1000 10 20 30 40 50 60ms NIPS 2007 Tutorial 73 Michael S. Lewicki ! Carnegie Mellon
  74. 74. How is this achieving an efficient, time-relative code? Time-relative coding of glottal pulsesab 5000Kernel CF (Hz) 2000 1000 500 200 100 0 20 40 60 80 100 ms NIPS 2007 Tutorial 74 Michael S. Lewicki ! Carnegie Mellon
  75. 75. $ explains data from principles Theory $ requires idealization and abstraction $ code signals accurately and efficiently Principle $ adapted to natural sensory environment Idealization $ analog spikes $ information theory, natural sounds Methodology $ optimization Prediction $ explains individual receptive fields $ explains population organizationNIPS 2007 Tutorial 75 Michael S. Lewicki ! Carnegie Mellon
  76. 76. A gap in the theory? - - + - - Learned from Hubel, 1995NIPS 2007 Tutorial 76 Michael S. Lewicki ! Carnegie Mellon
  77. 77. Redundancy reduction for noisy channels (Atick, 1992) s input x recoding Ax output y channel A channel y = x +" y = Ax +! Mutual information P (x, s) I(x, s) = P (x, s) log2 s,x P (s)P (x) I(x, s) = 0 iff P (x, s) = P (x)P (s), i.e. x and s are independent.NIPS 2007 Tutorial 77 Michael S. Lewicki ! Carnegie Mellon
  78. 78. Profiles of optimal filters • high SNR - - reduce redundancy - + - - - center-surround - - + - - + • low SNR - average - low-pass filter • matches behavior of retinal ganglion cellsNIPS 2007 Tutorial 78 Michael S. Lewicki ! Carnegie Mellon
  79. 79. An observation: Contrast sensitivity of ganglion cells Luminance level decreases one log unit each time we go to lower curve.NIPS 2007 Tutorial 79 Michael S. Lewicki ! Carnegie Mellon
  80. 80. Natural images have a 1/f amplitude spectrum Field, 1987NIPS 2007 Tutorial 80 Michael S. Lewicki ! Carnegie Mellon
  81. 81. Components of predicted filters whitening filter low-pass filter optimal filter from Atick, 1992NIPS 2007 Tutorial 81 Michael S. Lewicki ! Carnegie Mellon
  82. 82. Predicted contrast sensitivity functions match neural data from Atick, 1992NIPS 2007 Tutorial 82 Michael S. Lewicki ! Carnegie Mellon
  83. 83. Robust coding of natural images • Theory refined: - image is noisy and blurred - neural population size changes - neurons are noisy from Hubel, 1995(a) Undistorted image (b) Fovea retinal image (c) 40 degrees eccentricity Intensity -4 0 4 -4 0 4 Visual angle [arc min] from Doi and Lewicki, 2006NIPS 2007 Tutorial 83 Michael S. Lewicki ! Carnegie Mellon
  84. 84. Problem 1: Real neurons are “noisy” Estimates of neural information capacity system (area) stimulus bits / sec bits / spike fly visual (H1) motion 64 ~1 monkey visual (MT) motion 5.5 - 12 0.6 - 1.5 frog auditory (auditory nerve) noise & call 46 & 133 1.4 & 7.8 Salamander visual (ganglinon cells) rand. spots 3.2 1.6 cricket cercal (sensory afferent) mech. motion 294 3.2 cricket cercal (sensory afferent) wind noise 75 - 220 0.6 - 3.1 cricket cercal (10-2 and 10-3) wind noise 8 - 80 avg. = 1 Electric fish (P-afferent) amp. modulation 0 - 200 0 - 1.2 Limited capacity neural codes needBorst and Theunissen, 1999 After to be robust.NIPS 2007 Tutorial 84 Michael S. Lewicki ! Carnegie Mellon
  85. 85. Traditional codes are not robust encoding neurons sensory input OriginalNIPS 2007 Tutorial 85 Michael S. Lewicki ! Carnegie Mellon
  86. 86. Traditional codes are not robust encoding neurons Add noise equivalent to 1 bit precision sensory input Original 1x efficient coding reconstruction (34% error) 1 bit precisionNIPS 2007 Tutorial 86 Michael S. Lewicki ! Carnegie Mellon
  87. 87. al., to pairwise correla- al., st 2001; Schneidman ete correlations that can not just to pairwise correla- Redundancy plays an important role in neural coding ental Procedures). By can to all correlations thatxperimental Procedures). Bycy can be measuredmodel ofcan be measured undancy the light re-g any model of the light re- ancy was defined as edundancy was defined as Response of salamander retinal ganglion cells to natural moviesal information that that the mutual information thene conveyed aboutstim-veyed about the the stim- information conveyed d the information conveyed a As information rates).,Rb;S). As information ratesn of ganglion cells, wepulation of ganglion cells, we as a fraction of the minimumaction of the minimumal cells cells (Reichal., al.,dividual (Reich et et min{I(Ra;S), I(Rb;S)}, is the ancy I(Rb;S)}, is the Ra;S),between two cells, sobetween be nocells, sothan ncy can two greater fractional redundancyan be 0 when the two cells cy is no greater than 0 when the two cellsmation about the stimulus; its lls encode exactly the same about the stimulus; its ell’s information is a subset ode exactly the same ues of the redundancy mean formation is a subset c. the redundancy mean s of Ganglion Cells rocessing under realistic vi- anglion Cells ted retinas with a set of nat-sing underarealistic of envi- represent variety vi- From Puchalla et al, 2005 cell pair distance (%m) important a set of nat- of inas with characteristicsent a NIPS 2007 Tutorial by the variety of envi- field motion caused 87 Michael S. Lewicki ! Carnegie Mellon
  88. 88. tance, the code. great numbersee, coding elementsintroduces redundancy into the cod ty of what if a As we will of robust coding are available while their coding pre o be Robust codingthe assumed noise in 2005,2006) case, separated from (Doi and Lewicki, the representation, unlike PCA or ICA the the apparent dimensionality could be increased (instead of reduced) while tha limited. Robust coding should make optimal use of such an arbitrary number of cod e present the optimal linear encoder andcoding introduces redundancy into the code lity of the code. As we will Encoder see, robust decoder when the coefficients are noisy, an Decoder to be separated from the assumedthe different representation, unlike PCA or ICApaper r to the number of units and to noise in the data and noise conditions. This that x W u A ^ x , we formulate the problem. Then, in section III, we analyze the solutions for the gwe present theand two-dimensional case. Next, in when theIV, we applyare noisy, andons for one- optimal Data encoder and decoder section coefficients the proposed linear Noiseless Reconstruction Representation y to theits considerable robustness different data and noise conditions. This codes. is nstrate number of units and to the compared against conventional image paper Fiudies. formulate the problem. Then, in section III, we analyze the solutions for the genII, we ions for one- and two-dimensional case. Next, in section IV, we apply the proposed ro Model limited precision usingonstrate its channel noise: robustnessPcompared FORMULATION additive considerable II. Channel Noise against conventional image codes. Fina ROBLEM n udies. (Fig. 1), we assume that the data is N -dimensional with zero mean and covmodel Encoder Decoderrices W ∈ RM ×N and A ∈ RN ×M . For each data point x, its representation r in r FORMULATION A ^ x perturbed byII.uP ROBLEM noise (i.e., channel noise) n ∼ N (0, σ 2 W xrough matrix W, the additive n Data Noiseless Noisy Reconstruction model (Fig. 1), we assume that the data is Representation Representation N -dimensional with zero mean and covaritrices W ∈ RM ×N and A ∈ r N ×M . For each= u + n. x, its representation r in th R = Wx + n data point channel noise) n ∼ N (0, σ 2 Ihrough matrix W, perturbed by the additive noise (i.e., vectors. The reconstructionnofM the encoding matrix and its row vectors as encodingnoisy representation using matrix A: + n = u + n. r = Wx Caveat: linear, but can evaluate for differents the encoding matrix and its row= Ar = encoding vectors. x vectors as AWx + An. ˆ Thenoise levels. reconstruction of adnoisy representation using matrix A: NIPS 2007 Tutorial 88 Michael The term AWx i the decoding matrix and its column vectors as decoding vectors. S. Lewicki ! Carnegie Mellon
  89. 89. Generalizing the model: sensory noise and optical blur(a) Undistorted image (b) Fovea retinal image (c) 40 degrees eccentricity Intensity -4 0 4 -4 0 4 Visual angle [arc min] sensory noise channel noise ν δ only implicit optical blur encoder decoder s H x W r A s ˆ image observation representation reconstruction Can also add sparseness and resource constraintsNIPS 2007 Tutorial 89 Michael S. Lewicki ! Carnegie Mellon
  90. 90. REVIEWSRobust coding is distinct from “reading” noisy populations a b In EQN 2, si is the direction (th 100 100 triggers the strongest respo width of the tuning curve, s 80 80 (so if s = 359° and si = 2°, the Activity (spikes s–1) Activity (spikes s–1) 60 60 ing factor. In this case, all t share a common tuning cur 40 40 preferred directions, si (FIG. 1 involve bell-shaped tuning cu 20 20 The inclusion of the no because neurons are known 0 0 –100 0 100 –100 0 100 neuron that has an average Direction (deg) Preferred direction (deg) stimulus moving at 90° mig c d occasion for a particular 90° 100 100 another occasion for exactly factors contribute to this va 80 80 trolled or uncontrollable as Activity (spikes s–1) Activity (spikes s–1) presented to the monkey, a 60 60 neuronal responses. In the collectively considered as ran 40 40 this noise causes important 20 20 transmission and processing which are solved by populati 0 0 we should be concerned no –100 0 100 –100 0 100 computes with population co Preferred direction (deg) Preferred direction (deg) From Pouget et al, 2001 reliably in the presence of suc Figure 1 | The standard population coding model. a | Bell-shaped tuning curves to direction for 16 neurons. b | A population pattern of activity across 64 neurons with bell-shaped tuning curves in Decoding population cod response to an object moving at –40°. The activity of each cell was generated using EQN 1, and In this section, we shall addr Here, we want to learn an optimal image code using noisy neurons. plotted at the location of the preferred direction of the cell. The overall activity looks like a ‘noisy’ hill what information about t centred around the stimulus direction. c | Population vector decoding fits a cosine function to the object is available from the r observed activity, and uses the peak of the cosine function, s as an estimate of the encoded ˆ, direction. d | Maximum likelihood fits a template derived from the tuning curves of the cells. More neurons? Let us take a hypothNIPS 2007 Tutorial precisely, the template is obtained from the noiseless (or average) population activity in response to S. Lewicki we record Mellon 90 Michael that ! Carnegie the activity of and that these neurons have
  91. 91. ∆W ∝ − (Cost) = 2 A (IN − AW)Σx − λ diag (Cost) = (Error) + λ (Var.∂W Const.) M (83) How do we learn robust codes? M sensory noise 2 channel noise u2 A = Σx WT (σn IM + WΣx WT 2 (Var. Const.) = ln ν 2 i δ (84) i=1 σu optical blur encoder decoder ∂ ⇔ (Cost) = O s H x W r A ∂A s ˆ image observation representation reconstruction 4 ln{diag(WΣx WT )/c} u22 A (IObjective: x − λ T N − AW)Σ diag T) WΣxi = i SNR (85) M diag(WΣx W 2 σn ! Find W and A that minimize reconstruction error. A •= Channel capacity of+ WΣneuron:−1 Σx WT (σn IM the ith x WT ) 2 u 2 = σu i 2 (86) 1 Ci = ln(SNRi + 1) ∂ ⇔ (Cost) = O 2 (87) ∂A • To limit capacity, fix the coefficient signal to noise ratio:√ λ 1 + λ2 √ √ √λ1 u2 2(1 + M SNR) λ2 SNRi = 2i 2 (88) σn Now robust coding is formulated as a 2 constrained optimization problem. N 1 1 u 2 = σu 2 E= M λi i N · SNR + 1 N i=1 (89) NIPS 2007 Tutorial 91 Michael S. Lewicki ! Carnegie Mellon
  92. 92. Robust coding of natural images encoding neurons Add noise equivalent to 1 bit precision sensory input Original 1x efficient coding reconstruction (34% error) 1 bit precisionNIPS 2007 Tutorial 92 Michael S. Lewicki ! Carnegie Mellon
  93. 93. Robust coding of natural images encoding neurons Weights adapted for optimal robustness sensory input Original 1x robust coding reconstruction (3.8% error) 1 bit precisionNIPS 2007 Tutorial 93 Michael S. Lewicki ! Carnegie Mellon
  94. 94. Reconstruction improves by adding neurons encoding neurons Weights adapted for optimal robustness sensory input Original 8x robust coding reconstruction error: 0.6% 1 bit precisionNIPS 2007 Tutorial 94 Michael S. Lewicki ! Carnegie Mellon
  95. 95. Adding precision to 1x robust coding original images 1 bit: 12.5% error 2 bit: 3.1% errorNIPS 2007 Tutorial 95 Michael S. Lewicki ! Carnegie Mellon
  96. 96. 1 M ·2 (1, ·√ ,1 T ∈ RM . It takes a convenient fo where 1M SNR + 1· · 1) = Can 2(1 + M SNR) 2 average error bound derive minimum theoretical 2 λ2 2-D 2σx Isotropic E= M diag(V 2 · SNR + 1 when we define V (byλ1 + λ2 ) √ 22 N a linear transform of W, √ 2-D 1= 11 E = M E M · SNR + 1 Anisotropic λi if SNR ≥ SNRc N · SNR + 1 N i=1 2 2 V≡ λ1 E= = EDET √ where Σx· SNR + + λ2 the  is if SNR ≤ SNRc eigenvalue decomposition of M 1 √ λ1 λi -≡ Dii are theof the data covariance. To summarize, the c N ith eigenvalue eigenvalues of Σx its λi 1  thatinputlinear transform. V should have row vectors of un  E = MN- i=1 dimensionality .  .  · SNR + 1usN (neurons) N M Next, coding units √ a necessary condition for the min - # of let consider λN all A PPLICATION Regarding A, ∂E/∂A = O yields (se IV. free parameters.TO IMAGE CODINGata. we characterized the optimal solutions for 1-D and 2-D data. For1the high ection A = high-dim γ 2 ES(Iata, or theto be investigated. Here, we data. lower bound solutions for us remains eigenvalues of isotoropictheoretical numerical Algorithm achieves present σbustness to channel noise. To derive an optimal solution we can employ a gradons) (Note that Results Bound the necessary condition with respect to W (or efunction (its details are given in [3], [4]). W (or V) 19.9% is constrained to a certain subspace by the chaperformance of our proposed code when applied 20.3%test image. The data consists 0.5x to a sampled T ) = 1x 512×512 pixel image. We set the number andcodin Using the channel T 12.5% constraint (eq. 9) of the ne randomlydiag( uufrom the diag(WΣx 12.4% W capacity σ 2 1 ) = u M be8xsimplified as (see Appendix B) Balcan, Doi, and Lewicki, 2007; 2.0% 2.0% Balcan and Lewicki, 2007 NIPS 2007 Tutorial 96 E = tr{D · ( Michael S. Lewicki ! Carnegie Mellon
  97. 97. Sparseness localizes the vectors and increases coding efficiency normalized histogram encoding vectors (W) decoding vectors (A) of coeff. kurtosis Kurtosis of RC 100 Kurtosis of RC 100 avg. = 4.9 80 80robust coding % % frequency 60 frequency 60 % 40 40 20 20 0 0 4 6 8 10 4 6 kurtosis 8 10 average error is unchanged kurtosis kurtosis 12.5% & 12.5% Kurtosis of RSC 100 Kurtosis of RSC 100 avg. = 7.3robust sparse coding 80 80 % % frequency 60 frequency 60 %40 40 20 20 0 0 4 6 8 10 4 6 kurtosis 8 10 kurtosis kurtosis NIPS 2007 Tutorial 97 Michael S. Lewicki ! Carnegie Mellon
  98. 98. Non-zero resource constraints localize weights (a) 20dB 10dB 0dB -10dB fovea (d) 20dB (b) Magnitude -10dB (c) Spatial freq. average error is also unchanged 40 degrees20dB -10dBNIPS 2007 Tutorial 98 Michael S. Lewicki ! Carnegie Mellon
  99. 99. $ explains data from principles Theory $ requires idealization and abstraction $ code signals accurately, efficiently, robustly Principle $ adapted to natural sensory environment $ cell response is linear, noise is additive Idealization $ simple, additive resource constraints $ information theory, natural images Methodology $ constrained optimization $ explains individual receptive fields Prediction $ explains non-linear response $ explains population organizationNIPS 2007 Tutorial 99 Michael S. Lewicki ! Carnegie Mellon
  100. 100. What happens next? - - + - - Learned from Hubel, 1995NIPS 2007 Tutorial 100 Michael S. Lewicki ! Carnegie Mellon
  101. 101. What happens next?NIPS 2007 Tutorial 101 Michael S. Lewicki ! Carnegie Mellon
  102. 102. of filters with strongly matic gain control known as ‘divisive normalization’ that has beenis larger; for a pair that used to account for many nonlinear steady-state behaviors of neu- Joint statisticsrons in primary visual cortex10,20,21. Normalization models havertion is zero. Because L2 of filter outputs show magnitude dependenceber of other filters with- been motivated by several basic properties. First, gain control ralization of this condi- portional to a weighted neighborhood and an histo{ L | L ! 0.1 } histo{ L2 | L1 ! 0.9 } 2 1optimal weights and an 1 1ihood of the conditionales or sounds (Methods, 0.6 0.6er for pairs of filters that 0.2 0.2ge as seen through two lin- ical filter (L2), conditioned -1 0 1 -1 0 1diagonal spatially shifted fil-r all image positions, and a 1he frequency of occurencensional histograms are ver-widths of these histogramse not statistically indepen- L2 0 histo{ L2| L1 } ull two-dimensional condi- tional to the bin counts,escaled to fill the range ofhly decorrelated (expected -1 of L1) but not statistically -1 0 1 stribution of L2 increases Ltive) of L1. 1 nature neuroscience • volume 4 no 8 • Schwartz and Simoncelli (2001) from august 2001 NIPS 2007 Tutorial 102 Michael S. Lewicki ! Carnegie Mellon
  103. 103. Using sensory gain control to reduce redundancySchwartz and Simoncelli (2001) 1 4 Fig. 4. Generic no response is divided A divisive boring filters and a normalization 0 Maximum Likeliho model The conditional his that the variance o 0 -1 -1 0 1 0 4 gram is a represen Other squared a particular mecha filter responses 2 ! F1 that is primarily e In Fig. 7b, the hig Stimulus quency bands wi F2 2 Adapt weights to DISCUSSION ! Other squared We have describ minimize higher- filter responses order dependencies processing, in w from natural images divided by a gain the squared lineaNIPS 2007 Tutorial 103 Michael S. Lewicki ! Carnegie Mellonmasking tone. As in the visual data, the rate–level curves of the stant. The form
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×