NIPS2007: sensory coding and hierarchical representations
Upcoming SlideShare
Loading in...5
×
 

NIPS2007: sensory coding and hierarchical representations

on

  • 439 views

 

Statistics

Views

Total Views
439
Views on SlideShare
439
Embed Views
0

Actions

Likes
0
Downloads
5
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-ShareAlike LicenseCC Attribution-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

NIPS2007: sensory coding and hierarchical representations NIPS2007: sensory coding and hierarchical representations Presentation Transcript

  • Sensory Coding & Hierarchical Representations Michael S. Lewicki Computer Science Department &Center for the Neural Basis of Cognition Carnegie Mellon University
  • “IT” V4 V2 retina LGN V1NIPS 2007 Tutorial 2 Michael S. Lewicki ! Carnegie Mellon
  • Fig. 1. Anatomical and functional hierarchies in the macaque visual system. The human Hedgé and Felleman, 2007 from visual system (not shown) isbelieved to be roughly similar. A, A schematic summary of the laminar patterns of feed-forward (or ascending) and feed-back (or descending) connections for visual area V1. The laminar patterns vary somewhat from one visual area to the next.But in general, the connections are complementary, so that the ascending connections terminate in the granular layer (layer4) and the descending connections avoid it. The connections are generally reciprocal, in that an area that sends feed-forward connections to another area also receives feedback connections from it. The visual anatomical hierarchy is definedbased on, among other things, the laminar patterns of these interconnections among the various areas. See text for details.B, A simplified version of the visual anatomical hierarchy in the macaque monkey. For the complete version, see Fellemanand Van Essen (1991). See text for additional details. AIT = anterior inferotemporal; LGN = lateral geniculate nucleus;LIP = lateral intraparietal; MT = middle temporal; MST = medial superior temporal; PIT = posterior inferotemporal; V1 =visual area 1; V2 = visual area 2; V4 = visual area 4; VIP = ventral intraparietal. C, A model of hierarchical processing ofvisual information proposed by David Marr (1982). D, A schematic illustration of the presumed parallels between theanatomical and functional hierarchies. It is widely presumed 3 only that visual processingMichael S. Lewicki ! Carnegie Mellon the NIPS 2007 Tutorial not is hierarchical but also that
  • Palmer and Rock, 1994NIPS 2007 Tutorial 4 Michael S. Lewicki ! Carnegie Mellon
  • V1 simple cells !"#"$%&"()&"*+(,)(-(./(0&1$*"(#" 2"345"*&06 789-: Hubel and Wiesel, 1959 DeAngelis, et al, 1995NIPS 2007 Tutorial 5 Michael S. Lewicki ! Carnegie Mellon
  • NIPS 2007 Tutorial 6 Michael S. Lewicki ! Carnegie Mellon
  • NIPS 2007 Tutorial 7 Michael S. Lewicki ! Carnegie Mellon
  • Out of the retina • > 23 distinct neural pathways, no simple function division • Suprachiasmatic nucleus: circadian rhythm • Accessory optic system: stabilize retinal image • Superior colliculus: integrate visual and auditory information with head movements, direct eyes • Pretectum: plays adjusting pupil size, track large moving objects • Pregeniculate: cells responsive to ambient light • Lateral geniculate (LGN): main “relay” to visual cortex; contains 6 distinct layers, each with 2 sublayers. Organization very complexNIPS 2007 Tutorial 8 Michael S. Lewicki ! Carnegie Mellon
  • V1 simple cell integration of LGN cells !"#"$%&"(()*+,-+)".*)(/"*"&0.1/("2(//& !"#$%&(")&* +,%"- 345 2(//& 60.1/( 2(// Hubel and Wiesel, 1963 !78(/"#"$0(&(/9":;<=NIPS 2007 Tutorial 9 Michael S. Lewicki ! Carnegie Mellon
  • Hyper-column model Hubel and Wiesel, 1978NIPS 2007 Tutorial 10 Michael S. Lewicki ! Carnegie Mellon
  • Anatomical circuitry in V1 54 CALLAWAY Figure 1 Contributions of individual neurons to local excitatory connections between cortical layers. From left to right, typical spiny from Callaway, 1998 4C, 2–4B, 5, and 6 are shown. Dendritic neurons in layers arbors are illustrated with thick lines and axonal arbors with finer lines. Each cell projects axons specifically to only a subset of the layers. This simplified diagram focuses on connections between layers 2–4B, 4C, 5, and 6 and does not incorporate details of circuitry specific for subdivisions ofNIPS 2007 Tutorial Michael S. Lewicki ! Carnegie these layers. A model of the interactions between these neurons is shown in Figure 2. The neurons Mellon 11
  • Where is this headed? Ramon y CajalNIPS 2007 Tutorial 12 Michael S. Lewicki ! Carnegie Mellon
  • A wing would be a most mystifying structureif one did not know that birds flew. Horace Barlow, 1961An algorithm is likely to be understood morereadily by understanding the nature of theproblem being solved than by examining themechanism in which it is embodied. David Marr, 1982
  • A theoretical approach • Look at the system from a functional perspective: What problems does it need to solve? • Abstract from the details: Make predictions from theoretical principles. • Models are bottom-up; theories are top-down. What are the relevant computational principles?NIPS 2007 Tutorial 14 Michael S. Lewicki ! Carnegie Mellon
  • Representing structure in natural images image from Field (1994)NIPS 2007 Tutorial 15 Michael S. Lewicki ! Carnegie Mellon
  • Representing structure in natural images image from Field (1994)NIPS 2007 Tutorial 16 Michael S. Lewicki ! Carnegie Mellon
  • Representing structure in natural images image from Field (1994)NIPS 2007 Tutorial 17 Michael S. Lewicki ! Carnegie Mellon
  • Representing structure in natural images Need to describe all image structure in the scene. What representation is best? image from Field (1994)NIPS 2007 Tutorial 18 Michael S. Lewicki ! Carnegie Mellon
  • V1 simple cells !"#"$%&"()&"*+(,)(-(./(0&1$*"(#" 2"345"*&06 789-: Hubel and Wiesel, 1959 DeAngelis, et al, 1995NIPS 2007 Tutorial 19 Michael S. Lewicki ! Carnegie Mellon
  • Oriented Gabor models of individual simple cells figure from Daugman, 1990; data from Jones and Palmer, 1987NIPS 2007 Tutorial 20 Michael S. Lewicki ! Carnegie Mellon
  • 2D Gabor wavelet captures spatial structure • 2D Gabor functions • Wavelet basis generated by dilations, translations, and rotations of a single basis function • Can also control phase and aspect ratio • (drifting) Gabor functions are what the eye “sees best”NIPS 2007 Tutorial 21 Michael S. Lewicki ! Carnegie Mellon
  • Recoding with Gabor functions Pixel entropy = 7.57 bits Recoding with 2D Gabor functions Coefficient entropy = 2.55 bitsNIPS 2007 Tutorial 22 Michael S. Lewicki ! Carnegie Mellon
  • How is the V1 population organized?NIPS 2007 Tutorial 23 Michael S. Lewicki ! Carnegie Mellon
  • A general approach to coding: redundancy reduction Correlation of adjacent pixels 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 Redundancy reduction is equivalent to efficient coding.NIPS 2007 Tutorial 24 Michael S. Lewicki ! Carnegie Mellon
  • Describing signals with a simple statistical model Principle Good codes capture the statistical distribution of sensory patterns. How do we describe the distribution? • Goal is to encode the data to desired precision x = a1 s1 + a2 s2 + · · · + aL sL + = As + • Can solve for the coefficients in the no noise case ˆ=A s −1 xNIPS 2007 Tutorial 25 Michael S. Lewicki ! Carnegie Mellon
  • An algorithm for deriving efficient linear codes: ICA Learning objective: Using independent component analysis (ICA) to optimize A: maximize coding efficiency maximize P(x|A) over A. ∂ ∆A ∝ AA log P (x|A) T ∂A = −A(zsT − I) Probability of the pattern ensemble is: where z = (log P(s))’. P (x1 , x2 , ..., xN |A) = P (xk |A) k This learning rule: To obtain P(x|A) marginalize over s: • learns the features that capture the most structure P (x|A) = ds P (x|A, s)P (s) • optimizes the efficiency of the code P (s) = | det A| What should we use for P(s)?NIPS 2007 Tutorial 26 Michael S. Lewicki ! Carnegie Mellon
  • Modeling Non-Gaussian distributions • Typical coeff. distributions of natural signals are non-Gaussian. !=2 !=4 !6 !4 !2 0 2 4 6 !6 !4 !2 0 2 4 6 s s 1 2NIPS 2007 Tutorial 27 Michael S. Lewicki ! Carnegie Mellon
  • The generalized Gaussian distribution 1 q P (x|q) ∝ exp(− |x| ) 2 • Or equivalently, and exponential power distribution (Box and Tiao, 1973): 2/(1+β) ω(β) x−µ P (x|µ, σ, β) = exp −c(β) σ σ • " varies monotonically with the kurtosis, #2: !=!0.75 " =!1.08 !=!0.25 " =!0.45 !=+0.00 (Normal) " =+0.00 2 2 2 !=+0.50 (ICA tanh) "2=+1.21 !=+1.00 (Laplacian) "2=+3.00 !=+2.00 "2=+9.26NIPS 2007 Tutorial 28 Michael S. Lewicki ! Carnegie Mellon
  • Modeling Gaussian distributions with PCA • !=0 !=0 Principal component analysis (PCA) describes the principal axes of variation in the data distribution. • This is equivalent to fitting the data with a multivariate Gaussian. !6 !4 !2 0 2 4 6 !6 !4 !2 0 2 4 6 s1 s2NIPS 2007 Tutorial 29 Michael S. Lewicki ! Carnegie Mellon
  • Modeling non-Gaussian distributions • !=2 !=4 What about non-Gaussian marginals? !6 !4 !2 0 2 4 6 !6 !4 !2 0 2 4 6 s1 s2 • How would this distribution be modeled by PCA?NIPS 2007 Tutorial 30 Michael S. Lewicki ! Carnegie Mellon
  • Modeling non-Gaussian distributions • !=2 !=4 What about non-Gaussian marginals? !6 !4 !2 0 2 4 6 !6 !4 !2 0 2 4 6 s1 s2 • How would this distribution be modeled by PCA? • How should the distribution be described? The non-orthogonal ICA solution captures the non- Gaussian structureNIPS 2007 Tutorial 31 Michael S. Lewicki ! Carnegie Mellon
  • Efficient coding of natural images: Olshausen and Field, 1996 naturalscene nature scene visual input visual input units image basis functions receptive fields ... ... ... before after learning learningNetwork weights are adapted to maximize coding efficiency:minimizes redundancy and maximizes the independence of the outputsNIPS 2007 Tutorial 32 Michael S. Lewicki ! Carnegie Mellon
  • Model predicts local and global receptive field properties Learned basis for natural images Overlaid basis function properties from Lewicki and Olshausen, 1999NIPS 2007 Tutorial 33 Michael S. Lewicki ! Carnegie Mellon
  • Algorithm selects best of many possible sensory codes Haar Learned Wavelet Gabor Fourier PCA Theoretical perspective: Not edge “detectors.” from Lewicki and Olshausen, 1999 An efficient code for natural images.NIPS 2007 Tutorial 34 Michael S. Lewicki ! Carnegie Mellon
  • Comparing coding efficiency on natural images Comparing codes on natural images 8 7 6 estimated bits per pixel 5 4 3 2 1 0 Gabor wavelet Haar Fourier PCA learnedNIPS 2007 Tutorial 35 Michael S. Lewicki ! Carnegie Mellon
  • Responses in primary visual cortex to visual motion from Wandell, 1995NIPS 2007 Tutorial 36 Michael S. Lewicki ! Carnegie Mellon
  • Sparse coding of time-varying images (Olshausen, 2002) "#$!% !( ! "#$&))!!!(% & ! *$&))!% ... & ! I(x, y, t) = ai (t )φi (x, y, t − t ) + (x, y, t) i t = ai (t) ∗ φi (x, y, t) + (x, y, t) iNIPS 2007 Tutorial 37 Michael S. Lewicki ! Carnegie Mellon
  • Sparse decomposition of image sequences convolution posterior maximum %#$ ! "% "% %#5 "#$ !% !% %#7 " 7% 7% %#! %#$ %#" 5% 5% 34),,3)/& 34),,3)/& $% % $% % !%#$ !%#" 6% 6% !%#! 8% !" 8% !%#7 9% !"#$ 9% !%#5 :% :% !! !%#$ !% 5% 6% !% 5% 6% &()*+,-.()*/0(1)-2 &()*+,-.()*/0(1)-2 reconstruction input sequence from Olshausen, 2002NIPS 2007 Tutorial 38 Michael S. Lewicki ! Carnegie Mellon
  • Learned spatio-temporal basis functions timebasisfunction from Olshausen, 2002NIPS 2007 Tutorial 39 Michael S. Lewicki ! Carnegie Mellon
  • Animated spatial-temporal basis functions from Olshausen, 2002NIPS 2007 Tutorial 40 Michael S. Lewicki ! Carnegie Mellon
  • $ explains data from principles Theory $ requires idealization and abstraction $ code signals accurately and efficiently Principle $ adapted to natural sensory environment Idealization $ cell response is linear Methodology $ information theory, natural images $ explains individual receptive fields Prediction $ explains population organizationNIPS 2007 Tutorial 41 Michael S. Lewicki ! Carnegie Mellon
  • How general is the efficient coding principle? Can it explain auditory coding?NIPS 2007 Tutorial 42 Michael S. Lewicki ! Carnegie Mellon
  • Limitations of the linear model • linear • only optimal for block, not whole signal • no phase-locking • representation depends on block alignmentNIPS 2007 Tutorial Michael S. Lewicki ! Carnegie Mellon
  • A continuous filterbank does not form an efficient code ⊗ ← → → → Goal: find a representation that is both time-relative and efficientNIPS 2007 Tutorial 44 Michael S. Lewicki ! Carnegie Mellon
  • Efficient signal representation using time-shiftable kernels (spikes)Smith and Lewicki (2005) Neural Comp. 17:19-45 M nm x(t) = sm,i φm (t − τm,i ) + (t) m=1 i=1 • Each spike encodes the precise time and magnitude of an acoustic feature • Figure 3: Smith, NC ms 2956 Two important theoretical abstractions for “spikes” - not binary - not probabilistic • Can convert to a population of stochastic, binary spikesNIPS 2007 Tutorial 45 Michael S. Lewicki ! Carnegie Mellon
  • Coding audio signals with spikes 5000Kernel CF (Hz) 2000 1000K 500 200 100 0 5 10 15 20 25 ms Input Reconstruction Residual NIPS 2007 Tutorial 46 Michael S. Lewicki ! Carnegie Mellon
  • Kernel CF (Hz) 2000Comparing a spike code to a spectrogram 1000 500 a 200 100 b c 5000 5000 FrequencyCF (Hz) 2000 Kernel (Hz) 4000 1000 3000 500 2000 200 a 1000 100 c b 100 200 300 400 500 600 700 800 ms 5000 Frequency (Hz) 4000 Kernel CF (Hz) 2000 3000 1000 2000 500 1000 200 100 100 200 300 400 500 600 700 800 ms c How do we compute the spikes?NIPS 2007 Tutorial 5000 47 Michael S. Lewicki ! Carnegie Mellon
  • “can” with filter-threshold 5000Kernel CF (Hz) 2000 1000 500 200 100 0 5 10 15 20 ms Input Reconstruction ResidualNIPS 2007 Tutorial 48 Michael S. Lewicki ! Carnegie Mellon
  • Spike Coding with Matching Pursuit 1. convolve signal with kernelsNIPS 2007 Tutorial 49 Michael S. Lewicki ! Carnegie Mellon
  • Spike Coding with Matching Pursuit 1. convolve signal with kernels 2. find largest peak over convolution setNIPS 2007 Tutorial 50 Michael S. Lewicki ! Carnegie Mellon
  • Spike Coding with Matching Pursuit 1. convolve signal with kernels 2. find largest peak over convolution set 3. fit signal with kernelNIPS 2007 Tutorial 51 Michael S. Lewicki ! Carnegie Mellon
  • Spike Coding with Matching Pursuit 1. convolve signal with kernels 2. find largest peak over convolution set 3. fit signal with kernel 4. subtract kernel from signal, record spike, and adjust convolutionsNIPS 2007 Tutorial 52 Michael S. Lewicki ! Carnegie Mellon
  • Spike Coding with Matching Pursuit 1. convolve signal with kernels 2. find largest peak over convolution set 3. fit signal with kernel 4. subtract kernel from signal, record spike, and adjust convolutions 5. repeatNIPS 2007 Tutorial 53 Michael S. Lewicki ! Carnegie Mellon
  • Spike Coding with Matching Pursuit 1. convolve signal with kernels 2. find largest peak over convolution set 3. fit signal with kernel 4. subtract kernel from signal, record spike, and adjust convolutions 5. repeatNIPS 2007 Tutorial 54 Michael S. Lewicki ! Carnegie Mellon
  • Spike Coding with Matching Pursuit 1. convolve signal with kernels 2. find largest peak over convolution set 3. fit signal with kernel 4. subtract kernel from signal, record spike, and adjust convolutions 5. repeatNIPS 2007 Tutorial 55 Michael S. Lewicki ! Carnegie Mellon
  • Spike Coding with Matching Pursuit 1. convolve signal with kernels 2. find largest peak over convolution set 3. fit signal with kernel 4. subtract kernel from signal, record spike, and adjust convolutions 5. repeatNIPS 2007 Tutorial 56 Michael S. Lewicki ! Carnegie Mellon
  • Spike Coding with Matching Pursuit 1. convolve signal with kernels 2. find largest peak over convolution set 3. fit signal with kernel 4. subtract kernel from signal, record spike, and adjust convolutions 5. repeat . . .NIPS 2007 Tutorial 57 Michael S. Lewicki ! Carnegie Mellon
  • Spike Coding with Matching Pursuit 1. convolve signal with kernels 2. find largest peak over convolution set 3. fit signal with kernel 4. subtract kernel from signal, record spike, and adjust convolutions 5. repeat . . . 6. halt when desired fidelity is reachedNIPS 2007 Tutorial 58 Michael S. Lewicki ! Carnegie Mellon
  • “can” 5 dB SNR, 36 spikes, 145 sp/sec 5000 Kernel CF (Hz) 2000 1000 500 200 100 0 50 100 150 200 ms Input Reconstruction ResidualNIPS 2007 Tutorial 59 Michael S. Lewicki ! Carnegie Mellon
  • “can” 10 dB SNR, 93 spikes, 379 sp/sec 5000 Kernel CF (Hz) 2000 1000 500 200 100 0 50 100 150 200 ms Input Reconstruction ResidualNIPS 2007 Tutorial 60 Michael S. Lewicki ! Carnegie Mellon
  • “can” 20 dB SNR, 391 spikes, 1700 sp/sec 5000 Kernel CF (Hz) 2000 1000 500 200 100 0 50 100 150 200 ms Input Reconstruction ResidualNIPS 2007 Tutorial 61 Michael S. Lewicki ! Carnegie Mellon
  • “can” 40 dB SNR, 1285 spikes, 5238 sp/sec 5000 Kernel CF (Hz) 2000 1000 500 200 100 0 50 100 150 200 ms Input Reconstruction ResidualNIPS 2007 Tutorial 62 Michael S. Lewicki ! Carnegie Mellon
  • Efficient auditory coding with optimized kernel shapesSmith and Lewicki (2006) Nature 439:978-982 What are the optimal kernel shapes? M nm x(t) = sm,i φm (t − τm,i ) + (t) m=1 i=1 Adapt algorithm of Olshausen (2002)NIPS 2007 Tutorial 63 Michael S. Lewicki ! Carnegie Mellon
  • Optimizing the probabilistic model M nm x(t) = sm φm(t − τim) + (t), i m=1 i=1 p(x|Φ) = p(x|Φ, s, τ )p(s)p(τ )dsdτ ≈ p(x|Φ, s, τ )p(ˆ)p(ˆ) ˆ ˆ s τ (t) ∼ N (0, σ )Learning (Olshausen, 2002): ∂ ∂ log p(x|Φ) = log p(x|Φ, s, τ ) + log p(ˆ)p(ˆ) ˆ ˆ s τ ∂φm ∂φm M n 1 ∂ m = [x − sm φm(t − τim)]2 ˆi 2σε ∂φm m=1 i=1 1 = [x − x] ˆ ˆi sm σε i Also adapt kernel lengthsNIPS 2007 Tutorial 64 Michael S. Lewicki ! Carnegie Mellon
  • Adapting the optimal kernel shapesNIPS 2007 Tutorial 65 Michael S. Lewicki ! Carnegie Mellon
  • Kernel functions optimized for coding speechNIPS 2007 Tutorial 66 Michael S. Lewicki ! Carnegie Mellon
  • Quantifying coding efficiency 5000 1. fit signal Kernel CF (Hz) 2000 2. quantize time and amplitude values 1000 spike code 3. prune zero values 4. measure coding efficiency using the 500 entropy of quantized values 200 5. reconstruct signal using quantized 100 0 50 150 values 6. measure fidelity using Input signal-to-noise ratio (SNR) of residual error original • identical procedure for other codes (e.g. Fourier and wavelet) Reconstruction reconstruction M nm x(t) = sm,i φm (tResidual) + (t) − τm,i m=1 i=1 residual errorNIPS 2007 Tutorial Michael S. Lewicki ! Carnegie Mellon 67
  • Coding efficiency curves 40 35 30 25 SNR (dB) +14 dB 20 15 4x more efficient 10 Spike Code: adapted Spike Code: gammatone 5 Block Code: wavelet Block Code: Fourier 0 0 10 20 30 40 50 60 70 80 90 Rate (Kbps)NIPS 2007 Tutorial 68 Michael S. Lewicki ! Carnegie Mellon
  • Using efficient coding theoryComparing atheoretical a spectrogram to make spike code to predictions a b 5000 Natural Sound Environment Kernel CF (Hz) 2000 evolution theory 1000 500 A simple model of auditory coding 200 physiological data: 100 optimal kernels: How do we compute the spikes? sound • auditory nerve non-linearities waveform filterbank filter shapes static c stochastic spiking population spike code • properties • population trends 5000 • coding efficiency Frequency (Hz) 4000 auditory revcor filters: gammatones 5 3000 Speech Prediction ? Auditory Nerve Filters 1 2 3 4 5 6 7 8 9 10 2000 time (ms) 2 1000 Bandwidth (kHz) 1 More theoretical questions: • Why gammatones? 1 2 3 4 5 time (ms) 0.5 100 200 300 only compare 500 the data after optimizing 400 to 600 700 800 ms • Why spikes? Michael S. Lewicki ! Carnegie Mellon • How is sound coded by we do not fit theBad Zwischenahn ! Aug 21, 2004 data 0.2 1 2 3 4 5 the spike population? time (ms) 0.1 0.1 0.2 0.5 1 2 5 Center Frequency (kHz) How do we develop a theory? Prediction depends on sound ensemble. deBoer and deJongh, 1978 Carney and Yin, 1988 Michael S. Lewicki ! Carnegie Mellon Bad Zwischenahn ! Aug 21, 2004NIPS 2007 Tutorial 69 Michael S. Lewicki ! Carnegie Mellon
  • Learned kernels share features of auditory nerve filters Auditory nerve filters Optimized kernels from Carney, McDuffy, and Shekhter, 1999 scale bar = 1 msecNIPS 2007 Tutorial 70 Michael S. Lewicki ! Carnegie Mellon
  • Learned kernels closely match individual auditory nerve filters for each kernel find closet matching auditory nerve filter in Laurel Carney’s database of ~100 filters.NIPS 2007 Tutorial 71 Michael S. Lewicki ! Carnegie Mellon
  • Learned kernels overlaid on selected auditory nerve filters For almost all learned kernels there is a closely matching auditory nerve filter.NIPS 2007 Tutorial 72 Michael S. Lewicki ! Carnegie Mellon
  • Coding of a speech consonanta Input Reconstruction Residualb 5000Kernel CF (Hz) 2000 4800 1000 4000 500 3400 200 38 39 40 100c 5000Frequency (Hz) 4000 3000 2000 1000 10 20 30 40 50 60ms NIPS 2007 Tutorial 73 Michael S. Lewicki ! Carnegie Mellon
  • How is this achieving an efficient, time-relative code? Time-relative coding of glottal pulsesab 5000Kernel CF (Hz) 2000 1000 500 200 100 0 20 40 60 80 100 ms NIPS 2007 Tutorial 74 Michael S. Lewicki ! Carnegie Mellon
  • $ explains data from principles Theory $ requires idealization and abstraction $ code signals accurately and efficiently Principle $ adapted to natural sensory environment Idealization $ analog spikes $ information theory, natural sounds Methodology $ optimization Prediction $ explains individual receptive fields $ explains population organizationNIPS 2007 Tutorial 75 Michael S. Lewicki ! Carnegie Mellon
  • A gap in the theory? - - + - - Learned from Hubel, 1995NIPS 2007 Tutorial 76 Michael S. Lewicki ! Carnegie Mellon
  • Redundancy reduction for noisy channels (Atick, 1992) s input x recoding Ax output y channel A channel y = x +" y = Ax +! Mutual information P (x, s) I(x, s) = P (x, s) log2 s,x P (s)P (x) I(x, s) = 0 iff P (x, s) = P (x)P (s), i.e. x and s are independent.NIPS 2007 Tutorial 77 Michael S. Lewicki ! Carnegie Mellon
  • Profiles of optimal filters • high SNR - - reduce redundancy - + - - - center-surround - - + - - + • low SNR - average - low-pass filter • matches behavior of retinal ganglion cellsNIPS 2007 Tutorial 78 Michael S. Lewicki ! Carnegie Mellon
  • An observation: Contrast sensitivity of ganglion cells Luminance level decreases one log unit each time we go to lower curve.NIPS 2007 Tutorial 79 Michael S. Lewicki ! Carnegie Mellon
  • Natural images have a 1/f amplitude spectrum Field, 1987NIPS 2007 Tutorial 80 Michael S. Lewicki ! Carnegie Mellon
  • Components of predicted filters whitening filter low-pass filter optimal filter from Atick, 1992NIPS 2007 Tutorial 81 Michael S. Lewicki ! Carnegie Mellon
  • Predicted contrast sensitivity functions match neural data from Atick, 1992NIPS 2007 Tutorial 82 Michael S. Lewicki ! Carnegie Mellon
  • Robust coding of natural images • Theory refined: - image is noisy and blurred - neural population size changes - neurons are noisy from Hubel, 1995(a) Undistorted image (b) Fovea retinal image (c) 40 degrees eccentricity Intensity -4 0 4 -4 0 4 Visual angle [arc min] from Doi and Lewicki, 2006NIPS 2007 Tutorial 83 Michael S. Lewicki ! Carnegie Mellon
  • Problem 1: Real neurons are “noisy” Estimates of neural information capacity system (area) stimulus bits / sec bits / spike fly visual (H1) motion 64 ~1 monkey visual (MT) motion 5.5 - 12 0.6 - 1.5 frog auditory (auditory nerve) noise & call 46 & 133 1.4 & 7.8 Salamander visual (ganglinon cells) rand. spots 3.2 1.6 cricket cercal (sensory afferent) mech. motion 294 3.2 cricket cercal (sensory afferent) wind noise 75 - 220 0.6 - 3.1 cricket cercal (10-2 and 10-3) wind noise 8 - 80 avg. = 1 Electric fish (P-afferent) amp. modulation 0 - 200 0 - 1.2 Limited capacity neural codes needBorst and Theunissen, 1999 After to be robust.NIPS 2007 Tutorial 84 Michael S. Lewicki ! Carnegie Mellon
  • Traditional codes are not robust encoding neurons sensory input OriginalNIPS 2007 Tutorial 85 Michael S. Lewicki ! Carnegie Mellon
  • Traditional codes are not robust encoding neurons Add noise equivalent to 1 bit precision sensory input Original 1x efficient coding reconstruction (34% error) 1 bit precisionNIPS 2007 Tutorial 86 Michael S. Lewicki ! Carnegie Mellon
  • al., to pairwise correla- al., st 2001; Schneidman ete correlations that can not just to pairwise correla- Redundancy plays an important role in neural coding ental Procedures). By can to all correlations thatxperimental Procedures). Bycy can be measuredmodel ofcan be measured undancy the light re-g any model of the light re- ancy was defined as edundancy was defined as Response of salamander retinal ganglion cells to natural moviesal information that that the mutual information thene conveyed aboutstim-veyed about the the stim- information conveyed d the information conveyed a As information rates).,Rb;S). As information ratesn of ganglion cells, wepulation of ganglion cells, we as a fraction of the minimumaction of the minimumal cells cells (Reichal., al.,dividual (Reich et et min{I(Ra;S), I(Rb;S)}, is the ancy I(Rb;S)}, is the Ra;S),between two cells, sobetween be nocells, sothan ncy can two greater fractional redundancyan be 0 when the two cells cy is no greater than 0 when the two cellsmation about the stimulus; its lls encode exactly the same about the stimulus; its ell’s information is a subset ode exactly the same ues of the redundancy mean formation is a subset c. the redundancy mean s of Ganglion Cells rocessing under realistic vi- anglion Cells ted retinas with a set of nat-sing underarealistic of envi- represent variety vi- From Puchalla et al, 2005 cell pair distance (%m) important a set of nat- of inas with characteristicsent a NIPS 2007 Tutorial by the variety of envi- field motion caused 87 Michael S. Lewicki ! Carnegie Mellon
  • tance, the code. great numbersee, coding elementsintroduces redundancy into the cod ty of what if a As we will of robust coding are available while their coding pre o be Robust codingthe assumed noise in 2005,2006) case, separated from (Doi and Lewicki, the representation, unlike PCA or ICA the the apparent dimensionality could be increased (instead of reduced) while tha limited. Robust coding should make optimal use of such an arbitrary number of cod e present the optimal linear encoder andcoding introduces redundancy into the code lity of the code. As we will Encoder see, robust decoder when the coefficients are noisy, an Decoder to be separated from the assumedthe different representation, unlike PCA or ICApaper r to the number of units and to noise in the data and noise conditions. This that x W u A ^ x , we formulate the problem. Then, in section III, we analyze the solutions for the gwe present theand two-dimensional case. Next, in when theIV, we applyare noisy, andons for one- optimal Data encoder and decoder section coefficients the proposed linear Noiseless Reconstruction Representation y to theits considerable robustness different data and noise conditions. This codes. is nstrate number of units and to the compared against conventional image paper Fiudies. formulate the problem. Then, in section III, we analyze the solutions for the genII, we ions for one- and two-dimensional case. Next, in section IV, we apply the proposed ro Model limited precision usingonstrate its channel noise: robustnessPcompared FORMULATION additive considerable II. Channel Noise against conventional image codes. Fina ROBLEM n udies. (Fig. 1), we assume that the data is N -dimensional with zero mean and covmodel Encoder Decoderrices W ∈ RM ×N and A ∈ RN ×M . For each data point x, its representation r in r FORMULATION A ^ x perturbed byII.uP ROBLEM noise (i.e., channel noise) n ∼ N (0, σ 2 W xrough matrix W, the additive n Data Noiseless Noisy Reconstruction model (Fig. 1), we assume that the data is Representation Representation N -dimensional with zero mean and covaritrices W ∈ RM ×N and A ∈ r N ×M . For each= u + n. x, its representation r in th R = Wx + n data point channel noise) n ∼ N (0, σ 2 Ihrough matrix W, perturbed by the additive noise (i.e., vectors. The reconstructionnofM the encoding matrix and its row vectors as encodingnoisy representation using matrix A: + n = u + n. r = Wx Caveat: linear, but can evaluate for differents the encoding matrix and its row= Ar = encoding vectors. x vectors as AWx + An. ˆ Thenoise levels. reconstruction of adnoisy representation using matrix A: NIPS 2007 Tutorial 88 Michael The term AWx i the decoding matrix and its column vectors as decoding vectors. S. Lewicki ! Carnegie Mellon
  • Generalizing the model: sensory noise and optical blur(a) Undistorted image (b) Fovea retinal image (c) 40 degrees eccentricity Intensity -4 0 4 -4 0 4 Visual angle [arc min] sensory noise channel noise ν δ only implicit optical blur encoder decoder s H x W r A s ˆ image observation representation reconstruction Can also add sparseness and resource constraintsNIPS 2007 Tutorial 89 Michael S. Lewicki ! Carnegie Mellon
  • REVIEWSRobust coding is distinct from “reading” noisy populations a b In EQN 2, si is the direction (th 100 100 triggers the strongest respo width of the tuning curve, s 80 80 (so if s = 359° and si = 2°, the Activity (spikes s–1) Activity (spikes s–1) 60 60 ing factor. In this case, all t share a common tuning cur 40 40 preferred directions, si (FIG. 1 involve bell-shaped tuning cu 20 20 The inclusion of the no because neurons are known 0 0 –100 0 100 –100 0 100 neuron that has an average Direction (deg) Preferred direction (deg) stimulus moving at 90° mig c d occasion for a particular 90° 100 100 another occasion for exactly factors contribute to this va 80 80 trolled or uncontrollable as Activity (spikes s–1) Activity (spikes s–1) presented to the monkey, a 60 60 neuronal responses. In the collectively considered as ran 40 40 this noise causes important 20 20 transmission and processing which are solved by populati 0 0 we should be concerned no –100 0 100 –100 0 100 computes with population co Preferred direction (deg) Preferred direction (deg) From Pouget et al, 2001 reliably in the presence of suc Figure 1 | The standard population coding model. a | Bell-shaped tuning curves to direction for 16 neurons. b | A population pattern of activity across 64 neurons with bell-shaped tuning curves in Decoding population cod response to an object moving at –40°. The activity of each cell was generated using EQN 1, and In this section, we shall addr Here, we want to learn an optimal image code using noisy neurons. plotted at the location of the preferred direction of the cell. The overall activity looks like a ‘noisy’ hill what information about t centred around the stimulus direction. c | Population vector decoding fits a cosine function to the object is available from the r observed activity, and uses the peak of the cosine function, s as an estimate of the encoded ˆ, direction. d | Maximum likelihood fits a template derived from the tuning curves of the cells. More neurons? Let us take a hypothNIPS 2007 Tutorial precisely, the template is obtained from the noiseless (or average) population activity in response to S. Lewicki we record Mellon 90 Michael that ! Carnegie the activity of and that these neurons have
  • ∆W ∝ − (Cost) = 2 A (IN − AW)Σx − λ diag (Cost) = (Error) + λ (Var.∂W Const.) M (83) How do we learn robust codes? M sensory noise 2 channel noise u2 A = Σx WT (σn IM + WΣx WT 2 (Var. Const.) = ln ν 2 i δ (84) i=1 σu optical blur encoder decoder ∂ ⇔ (Cost) = O s H x W r A ∂A s ˆ image observation representation reconstruction 4 ln{diag(WΣx WT )/c} u22 A (IObjective: x − λ T N − AW)Σ diag T) WΣxi = i SNR (85) M diag(WΣx W 2 σn ! Find W and A that minimize reconstruction error. A •= Channel capacity of+ WΣneuron:−1 Σx WT (σn IM the ith x WT ) 2 u 2 = σu i 2 (86) 1 Ci = ln(SNRi + 1) ∂ ⇔ (Cost) = O 2 (87) ∂A • To limit capacity, fix the coefficient signal to noise ratio:√ λ 1 + λ2 √ √ √λ1 u2 2(1 + M SNR) λ2 SNRi = 2i 2 (88) σn Now robust coding is formulated as a 2 constrained optimization problem. N 1 1 u 2 = σu 2 E= M λi i N · SNR + 1 N i=1 (89) NIPS 2007 Tutorial 91 Michael S. Lewicki ! Carnegie Mellon
  • Robust coding of natural images encoding neurons Add noise equivalent to 1 bit precision sensory input Original 1x efficient coding reconstruction (34% error) 1 bit precisionNIPS 2007 Tutorial 92 Michael S. Lewicki ! Carnegie Mellon
  • Robust coding of natural images encoding neurons Weights adapted for optimal robustness sensory input Original 1x robust coding reconstruction (3.8% error) 1 bit precisionNIPS 2007 Tutorial 93 Michael S. Lewicki ! Carnegie Mellon
  • Reconstruction improves by adding neurons encoding neurons Weights adapted for optimal robustness sensory input Original 8x robust coding reconstruction error: 0.6% 1 bit precisionNIPS 2007 Tutorial 94 Michael S. Lewicki ! Carnegie Mellon
  • Adding precision to 1x robust coding original images 1 bit: 12.5% error 2 bit: 3.1% errorNIPS 2007 Tutorial 95 Michael S. Lewicki ! Carnegie Mellon
  • 1 M ·2 (1, ·√ ,1 T ∈ RM . It takes a convenient fo where 1M SNR + 1· · 1) = Can 2(1 + M SNR) 2 average error bound derive minimum theoretical 2 λ2 2-D 2σx Isotropic E= M diag(V 2 · SNR + 1 when we define V (byλ1 + λ2 ) √ 22 N a linear transform of W, √ 2-D 1= 11 E = M E M · SNR + 1 Anisotropic λi if SNR ≥ SNRc N · SNR + 1 N i=1 2 2 V≡ λ1 E= = EDET √ where Σx· SNR + + λ2 the  is if SNR ≤ SNRc eigenvalue decomposition of M 1 √ λ1 λi -≡ Dii are theof the data covariance. To summarize, the c N ith eigenvalue eigenvalues of Σx its λi 1  thatinputlinear transform. V should have row vectors of un  E = MN- i=1 dimensionality .  .  · SNR + 1usN (neurons) N M Next, coding units √ a necessary condition for the min - # of let consider λN all A PPLICATION Regarding A, ∂E/∂A = O yields (se IV. free parameters.TO IMAGE CODINGata. we characterized the optimal solutions for 1-D and 2-D data. For1the high ection A = high-dim γ 2 ES(Iata, or theto be investigated. Here, we data. lower bound solutions for us remains eigenvalues of isotoropictheoretical numerical Algorithm achieves present σbustness to channel noise. To derive an optimal solution we can employ a gradons) (Note that Results Bound the necessary condition with respect to W (or efunction (its details are given in [3], [4]). W (or V) 19.9% is constrained to a certain subspace by the chaperformance of our proposed code when applied 20.3%test image. The data consists 0.5x to a sampled T ) = 1x 512×512 pixel image. We set the number andcodin Using the channel T 12.5% constraint (eq. 9) of the ne randomlydiag( uufrom the diag(WΣx 12.4% W capacity σ 2 1 ) = u M be8xsimplified as (see Appendix B) Balcan, Doi, and Lewicki, 2007; 2.0% 2.0% Balcan and Lewicki, 2007 NIPS 2007 Tutorial 96 E = tr{D · ( Michael S. Lewicki ! Carnegie Mellon
  • Sparseness localizes the vectors and increases coding efficiency normalized histogram encoding vectors (W) decoding vectors (A) of coeff. kurtosis Kurtosis of RC 100 Kurtosis of RC 100 avg. = 4.9 80 80robust coding % % frequency 60 frequency 60 % 40 40 20 20 0 0 4 6 8 10 4 6 kurtosis 8 10 average error is unchanged kurtosis kurtosis 12.5% & 12.5% Kurtosis of RSC 100 Kurtosis of RSC 100 avg. = 7.3robust sparse coding 80 80 % % frequency 60 frequency 60 %40 40 20 20 0 0 4 6 8 10 4 6 kurtosis 8 10 kurtosis kurtosis NIPS 2007 Tutorial 97 Michael S. Lewicki ! Carnegie Mellon
  • Non-zero resource constraints localize weights (a) 20dB 10dB 0dB -10dB fovea (d) 20dB (b) Magnitude -10dB (c) Spatial freq. average error is also unchanged 40 degrees20dB -10dBNIPS 2007 Tutorial 98 Michael S. Lewicki ! Carnegie Mellon
  • $ explains data from principles Theory $ requires idealization and abstraction $ code signals accurately, efficiently, robustly Principle $ adapted to natural sensory environment $ cell response is linear, noise is additive Idealization $ simple, additive resource constraints $ information theory, natural images Methodology $ constrained optimization $ explains individual receptive fields Prediction $ explains non-linear response $ explains population organizationNIPS 2007 Tutorial 99 Michael S. Lewicki ! Carnegie Mellon
  • What happens next? - - + - - Learned from Hubel, 1995NIPS 2007 Tutorial 100 Michael S. Lewicki ! Carnegie Mellon
  • What happens next?NIPS 2007 Tutorial 101 Michael S. Lewicki ! Carnegie Mellon
  • of filters with strongly matic gain control known as ‘divisive normalization’ that has beenis larger; for a pair that used to account for many nonlinear steady-state behaviors of neu- Joint statisticsrons in primary visual cortex10,20,21. Normalization models havertion is zero. Because L2 of filter outputs show magnitude dependenceber of other filters with- been motivated by several basic properties. First, gain control ralization of this condi- portional to a weighted neighborhood and an histo{ L | L ! 0.1 } histo{ L2 | L1 ! 0.9 } 2 1optimal weights and an 1 1ihood of the conditionales or sounds (Methods, 0.6 0.6er for pairs of filters that 0.2 0.2ge as seen through two lin- ical filter (L2), conditioned -1 0 1 -1 0 1diagonal spatially shifted fil-r all image positions, and a 1he frequency of occurencensional histograms are ver-widths of these histogramse not statistically indepen- L2 0 histo{ L2| L1 } ull two-dimensional condi- tional to the bin counts,escaled to fill the range ofhly decorrelated (expected -1 of L1) but not statistically -1 0 1 stribution of L2 increases Ltive) of L1. 1 nature neuroscience • volume 4 no 8 • Schwartz and Simoncelli (2001) from august 2001 NIPS 2007 Tutorial 102 Michael S. Lewicki ! Carnegie Mellon
  • Using sensory gain control to reduce redundancySchwartz and Simoncelli (2001) 1 4 Fig. 4. Generic no response is divided A divisive boring filters and a normalization 0 Maximum Likeliho model The conditional his that the variance o 0 -1 -1 0 1 0 4 gram is a represen Other squared a particular mecha filter responses 2 ! F1 that is primarily e In Fig. 7b, the hig Stimulus quency bands wi F2 2 Adapt weights to DISCUSSION ! Other squared We have describ minimize higher- filter responses order dependencies processing, in w from natural images divided by a gain the squared lineaNIPS 2007 Tutorial 103 Michael S. Lewicki ! Carnegie Mellonmasking tone. As in the visual data, the rate–level curves of the stant. The form
  • Model accounts for several non-linear response properties muli. Schwartz and Simoncelli (2001) audi- a Cell Modelmuli. grat- a Mask Signal (Cavanaugh et al., 2000) udi- Cell Model of a Mask Signalgrat- MeanMean firing rate (Cavanaugh et al., 2000) 80 e fits of a Mask contrast: firing rateMean 80 No maske fits 40 0.13gonal Mask contrast:Mean 0.5 mask No of an 40 0.13 onal 0 of a 0.03 0.3 1 0.03 0.3 1 0.5of anMean 0 Signal contrast Signal contrast of a non- 0.03 0.3 1 0.03 0.3 1 Mask Signal Signal contrastMeanmaxi- Signal contrastnon- b Mean Mean rate rate Mask Signal 80maxi- b Mask contrast: firing firing 80 No mask 40 0.13 Mask contrast: 0.5 No mask udi- 40 0 0.13 0.03 0.3 1 0.03 0.3 1 0.5cha- Signal contrast Signal contrastudi-have 0 0.03 0.3 1 0.03 0.3 1 ha-a V1 Signal contrast Signal contrasthave NIPS 2007 Tutorialodel 104 Michael S. Lewicki ! Carnegie Mellon
  • What about image structure? • Bottom-up approaches focus on the non-linearity. • Our aim here is to focus on the computational problem: How do we learn the intrinsic dimensions of natural image structure? • Idea: characterize how the local image distribution changes.NIPS 2007 Tutorial 105 Michael S. Lewicki ! Carnegie Mellon
  • Perceptual organization in natural scenes image of Kyoto, Japan from E. DoiNIPS 2007 Tutorial 106 Michael S. Lewicki ! Carnegie Mellon
  • Palmer and Rock’s theory of perceptual organization Palmer and Rock, 1994NIPS 2007 Tutorial 107 Michael S. Lewicki ! Carnegie Mellon
  • Gestalt grouping principles from Palmer, 1999NIPS 2007 Tutorial 108 Michael S. Lewicki ! Carnegie Mellon
  • Malik and Perona’s (1991) model of texture segregation input image output of dark bar filters output of light bar filters Visual FeaturesNIPS 2007 Tutorial 109 Michael S. Lewicki ! Carnegie Mellon
  • Malik and Perona algorithm can find texture boundaries Painting by Gustav Klimt texture boundaries from Malik and Perona algorithmNIPS 2007 Tutorial 110 Michael S. Lewicki ! Carnegie Mellon
  • Can we apply these to natural scenes? natural scenes are more complex, and the structures more subtleNIPS 2007 Tutorial 111 Michael S. Lewicki ! Carnegie Mellon
  • A different representation of a natural scene (Kersten and Yuille, 2003)NIPS 2007 Tutorial 112 Michael S. Lewicki ! Carnegie Mellon
  • A representation we’re more familiar withNIPS 2007 Tutorial 113 Michael S. Lewicki ! Carnegie Mellon
  • A representation we’re more familiar withNIPS 2007 Tutorial 114 Michael S. Lewicki ! Carnegie Mellon
  • This is what our brain doesNIPS 2007 Tutorial 115 Michael S. Lewicki ! Carnegie Mellon
  • How do neurons in the visual cortex generalize? Conjecture 1: Two regions are similar if they come from the same statistical distribution. Conjecture 2: Neurons encode the local distribution of natural images. (image data from Doi et al, 2003)NIPS 2007 Tutorial 116 Michael S. Lewicki ! Carnegie Mellon
  • Perceptual generalization in natural scenesNIPS 2007 Tutorial 117 Michael S. Lewicki ! Carnegie Mellon
  • Perceptual generalization in natural scenesNIPS 2007 Tutorial 118 Michael S. Lewicki ! Carnegie Mellon
  • Linear representations do not separate the image classes• bushes ...• hillside ...• tree edge ...• tree bark ... projection onto the first 2 principal components of the dataNIPS 2007 Tutorial Michael S. Lewicki ! Carnegie Mellon
  • Modeling local natural scene statistics • need to model local scene structure, not average scene statistics • need to model all structure - want a “complete” code - a universal “texture” model • code should be distributed and statistically efficientNIPS 2007 Tutorial Michael S. Lewicki ! Carnegie Mellon
  • ICA mixtures for similar images from Lee, Lewicki, and Sejnowski 2000 Limitations: • can only have a small number of classes • representations are not shared • cannot learn intrinsic dimensionsNIPS 2007 Tutorial Michael S. Lewicki ! Carnegie Mellon 121
  • Characterizing different natural image densitiesNIPS 2007 Tutorial 122 Michael S. Lewicki ! Carnegie Mellon
  • Characterizing different natural image densities !3 !2 !1 0 1 2 3NIPS 2007 Tutorial 123 Michael S. Lewicki ! Carnegie Mellon
  • Characterizing different natural image densities !3 !2 !1 0 1 2 3NIPS 2007 Tutorial 124 Michael S. Lewicki ! Carnegie Mellon
  • Characterizing different natural image densitiesNIPS 2007 Tutorial 125 Michael S. Lewicki ! Carnegie Mellon
  • Characterizing different natural image distributionsNIPS 2007 Tutorial 126 Michael S. Lewicki ! Carnegie Mellon
  • Generalizing the standard ICA modelKarklin and Lewicki (2003; 2005) P (s) = P (si ) i P (si ) qi si P (si ) ∝ exp − λiNIPS 2007 Tutorial 127 Michael S. Lewicki ! Carnegie Mellon
  • Generalizing the standard ICA modelKarklin and Lewicki (2003; 2005) qi ui − log p(u|B, v) ∝ P (s) = P (si )exp([Bv]i ) i i P (ui ) i ) P (s i |λ qqi i sui i P (ui |λi )∝ exp − P (si ) ∝ exp − λi i λ P (ui |λi ) log λi = [Bv]i log λiNIPS 2007 Tutorial 128 Michael S. Lewicki ! Carnegie Mellon
  • Independent density components qi ui − log p(u|B, v) ∝ i exp([Bv]i ) Bj P(u|vj= 2.0) P(u|vj= 0.0) P(u|vj=!2.0)NIPS 2007 Tutorial 129 Michael S. Lewicki ! Carnegie Mellon
  • Illustration of inference in the model using synthesized data true B P (vi ) ∝ exp(−|viB i ) learned |q ˆ v = arg max Pu j (v|B, u)a b d v log λi = [Bv]i qi ui P (ui ) ∝ exp − ui λic Synthesis Inference v ∼ p(v) ! = c exp(Bv) u ∼ p(u|! ) ! ˆ ! ˆ vNIPS 2007 Tutorial 130 Michael S. Lewicki ! Carnegie Mellon
  • Learning higher-order structure of natural images• A is learned with ICA• B is learned by maximizing the posterior distribution Train on natural imagesNIPS 2007 Tutorial 131 Michael S. Lewicki ! Carnegie Mellon
  • Interpreting higher-order density components raw weights Bi position orientation/scale vi Gabor function fitsNIPS 2007 Tutorial 132 Michael S. Lewicki ! Carnegie Mellon
  • Interpreting higher-order density components viNIPS 2007 Tutorial 133 Michael S. Lewicki ! Carnegie Mellon
  • Interpreting higher-order density components viNIPS 2007 Tutorial 134 Michael S. Lewicki ! Carnegie Mellon
  • Interpreting higher-order density components viNIPS 2007 Tutorial 135 Michael S. Lewicki ! Carnegie Mellon
  • Interpreting higher-order density components viNIPS 2007 Tutorial 136 Michael S. Lewicki ! Carnegie Mellon
  • Interpreting higher-order density components viNIPS 2007 Tutorial 137 Michael S. Lewicki ! Carnegie Mellon
  • Interpreting higher-order density components viNIPS 2007 Tutorial 138 Michael S. Lewicki ! Carnegie Mellon
  • Interpreting higher-order density components viNIPS 2007 Tutorial 139 Michael S. Lewicki ! Carnegie Mellon
  • Learned density components of natural images (30 out of 100 shown)NIPS 2007 Tutorial 140 Michael S. Lewicki ! Carnegie Mellon
  • Full set of natural image density componentsNIPS 2007 Tutorial 141 Michael S. Lewicki ! Carnegie Mellon
  • What about a feed-forward non-linearity? Isn’t this the same? 3. ! Do ICA again on output: 2. ! Add non-linearity: λi = log |si | 1.! Take standard ICA model:NIPS 2007 Tutorial 142 Michael S. Lewicki ! Carnegie Mellon
  • ICA on non-linearity reveals no structure subset of ICA basis functions on log |s|NIPS 2007 Tutorial 143 Michael S. Lewicki ! Carnegie Mellon
  • Inferred v forms a sparse distributed codeNIPS 2007 Tutorial 144 Michael S. Lewicki ! Carnegie Mellon
  • Most typical images for selected density componentsNIPS 2007 Tutorial 145 Michael S. Lewicki ! Carnegie Mellon
  • Distributed representation of natural image densitiesNIPS 2007 Tutorial 146 Michael S. Lewicki ! Carnegie Mellon
  • Comparing the degree of abstractionNIPS 2007 Tutorial 147 Michael S. Lewicki ! Carnegie Mellon
  • Winner maps: Maximum |ui| for each pixelNIPS 2007 Tutorial 148 Michael S. Lewicki ! Carnegie Mellon
  • Winner maps: Maximum |v | for each pixel iNIPS 2007 Tutorial 149 Michael S. Lewicki ! Carnegie Mellon
  • A distributed code for visual surfacesNIPS 2007 Tutorial 150 Michael S. Lewicki ! Carnegie Mellon
  • Smooth changes in representation for texture gradients v act 3 biggest pcs higher-level output First 3 PCs
  • Smooth changes in representation for texture gradients higher-level output First 3 PCs
  • Clustering the higher-order representation yields segmentation clustering v’s clustering color
  • Model can be extended to model local covariance structure
  • Linear code of image regions distributions in simple cell space (V1) 2D projection of 400D space
  • Distributed covariance model of image regions distributions in higher-order space 2D projection of 150D space For more, see workshop talk
  • $ explains data from principles Theory $ requires idealization and abstraction $ code signals accurately and efficiently Principle $ adapted to natural sensory environment Idealization $ neurons encode local image distributions $ information theory, natural images Methodology $ hierarchical, probabilistic models Prediction $ good generalization in natural images $ functional explanation of non-simple cells?NIPS 2007 Tutorial 157 Michael S. Lewicki ! Carnegie Mellon
  • Thank you!