NIPS2007: sensory coding and hierarchical representations
1.
Sensory Coding & Hierarchical Representations Michael S. Lewicki Computer Science Department &Center for the Neural Basis of Cognition Carnegie Mellon University
2.
“IT” V4 V2 retina LGN V1NIPS 2007 Tutorial 2 Michael S. Lewicki ! Carnegie Mellon
3.
Fig. 1. Anatomical and functional hierarchies in the macaque visual system. The human Hedgé and Felleman, 2007 from visual system (not shown) isbelieved to be roughly similar. A, A schematic summary of the laminar patterns of feed-forward (or ascending) and feed-back (or descending) connections for visual area V1. The laminar patterns vary somewhat from one visual area to the next.But in general, the connections are complementary, so that the ascending connections terminate in the granular layer (layer4) and the descending connections avoid it. The connections are generally reciprocal, in that an area that sends feed-forward connections to another area also receives feedback connections from it. The visual anatomical hierarchy is definedbased on, among other things, the laminar patterns of these interconnections among the various areas. See text for details.B, A simplified version of the visual anatomical hierarchy in the macaque monkey. For the complete version, see Fellemanand Van Essen (1991). See text for additional details. AIT = anterior inferotemporal; LGN = lateral geniculate nucleus;LIP = lateral intraparietal; MT = middle temporal; MST = medial superior temporal; PIT = posterior inferotemporal; V1 =visual area 1; V2 = visual area 2; V4 = visual area 4; VIP = ventral intraparietal. C, A model of hierarchical processing ofvisual information proposed by David Marr (1982). D, A schematic illustration of the presumed parallels between theanatomical and functional hierarchies. It is widely presumed 3 only that visual processingMichael S. Lewicki ! Carnegie Mellon the NIPS 2007 Tutorial not is hierarchical but also that
4.
Palmer and Rock, 1994NIPS 2007 Tutorial 4 Michael S. Lewicki ! Carnegie Mellon
5.
V1 simple cells !"#"$%&"()&"*+(,)(-(./(0&1$*"(#" 2"345"*&06 789-: Hubel and Wiesel, 1959 DeAngelis, et al, 1995NIPS 2007 Tutorial 5 Michael S. Lewicki ! Carnegie Mellon
6.
NIPS 2007 Tutorial 6 Michael S. Lewicki ! Carnegie Mellon
7.
NIPS 2007 Tutorial 7 Michael S. Lewicki ! Carnegie Mellon
8.
Out of the retina • > 23 distinct neural pathways, no simple function division • Suprachiasmatic nucleus: circadian rhythm • Accessory optic system: stabilize retinal image • Superior colliculus: integrate visual and auditory information with head movements, direct eyes • Pretectum: plays adjusting pupil size, track large moving objects • Pregeniculate: cells responsive to ambient light • Lateral geniculate (LGN): main “relay” to visual cortex; contains 6 distinct layers, each with 2 sublayers. Organization very complexNIPS 2007 Tutorial 8 Michael S. Lewicki ! Carnegie Mellon
9.
V1 simple cell integration of LGN cells !"#"$%&"(()*+,-+)".*)(/"*"&0.1/("2(//& !"#$%&(")&* +,%"- 345 2(//& 60.1/( 2(// Hubel and Wiesel, 1963 !78(/"#"$0(&(/9":;<=NIPS 2007 Tutorial 9 Michael S. Lewicki ! Carnegie Mellon
10.
Hyper-column model Hubel and Wiesel, 1978NIPS 2007 Tutorial 10 Michael S. Lewicki ! Carnegie Mellon
11.
Anatomical circuitry in V1 54 CALLAWAY Figure 1 Contributions of individual neurons to local excitatory connections between cortical layers. From left to right, typical spiny from Callaway, 1998 4C, 2–4B, 5, and 6 are shown. Dendritic neurons in layers arbors are illustrated with thick lines and axonal arbors with ﬁner lines. Each cell projects axons speciﬁcally to only a subset of the layers. This simpliﬁed diagram focuses on connections between layers 2–4B, 4C, 5, and 6 and does not incorporate details of circuitry speciﬁc for subdivisions ofNIPS 2007 Tutorial Michael S. Lewicki ! Carnegie these layers. A model of the interactions between these neurons is shown in Figure 2. The neurons Mellon 11
12.
Where is this headed? Ramon y CajalNIPS 2007 Tutorial 12 Michael S. Lewicki ! Carnegie Mellon
13.
A wing would be a most mystifying structureif one did not know that birds ﬂew. Horace Barlow, 1961An algorithm is likely to be understood morereadily by understanding the nature of theproblem being solved than by examining themechanism in which it is embodied. David Marr, 1982
14.
A theoretical approach • Look at the system from a functional perspective: What problems does it need to solve? • Abstract from the details: Make predictions from theoretical principles. • Models are bottom-up; theories are top-down. What are the relevant computational principles?NIPS 2007 Tutorial 14 Michael S. Lewicki ! Carnegie Mellon
15.
Representing structure in natural images image from Field (1994)NIPS 2007 Tutorial 15 Michael S. Lewicki ! Carnegie Mellon
16.
Representing structure in natural images image from Field (1994)NIPS 2007 Tutorial 16 Michael S. Lewicki ! Carnegie Mellon
17.
Representing structure in natural images image from Field (1994)NIPS 2007 Tutorial 17 Michael S. Lewicki ! Carnegie Mellon
18.
Representing structure in natural images Need to describe all image structure in the scene. What representation is best? image from Field (1994)NIPS 2007 Tutorial 18 Michael S. Lewicki ! Carnegie Mellon
19.
V1 simple cells !"#"$%&"()&"*+(,)(-(./(0&1$*"(#" 2"345"*&06 789-: Hubel and Wiesel, 1959 DeAngelis, et al, 1995NIPS 2007 Tutorial 19 Michael S. Lewicki ! Carnegie Mellon
20.
Oriented Gabor models of individual simple cells ﬁgure from Daugman, 1990; data from Jones and Palmer, 1987NIPS 2007 Tutorial 20 Michael S. Lewicki ! Carnegie Mellon
21.
2D Gabor wavelet captures spatial structure • 2D Gabor functions • Wavelet basis generated by dilations, translations, and rotations of a single basis function • Can also control phase and aspect ratio • (drifting) Gabor functions are what the eye “sees best”NIPS 2007 Tutorial 21 Michael S. Lewicki ! Carnegie Mellon
22.
Recoding with Gabor functions Pixel entropy = 7.57 bits Recoding with 2D Gabor functions Coefﬁcient entropy = 2.55 bitsNIPS 2007 Tutorial 22 Michael S. Lewicki ! Carnegie Mellon
23.
How is the V1 population organized?NIPS 2007 Tutorial 23 Michael S. Lewicki ! Carnegie Mellon
24.
A general approach to coding: redundancy reduction Correlation of adjacent pixels 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 Redundancy reduction is equivalent to efﬁcient coding.NIPS 2007 Tutorial 24 Michael S. Lewicki ! Carnegie Mellon
25.
Describing signals with a simple statistical model Principle Good codes capture the statistical distribution of sensory patterns. How do we describe the distribution? • Goal is to encode the data to desired precision x = a1 s1 + a2 s2 + · · · + aL sL + = As + • Can solve for the coefﬁcients in the no noise case ˆ=A s −1 xNIPS 2007 Tutorial 25 Michael S. Lewicki ! Carnegie Mellon
26.
An algorithm for deriving efﬁcient linear codes: ICA Learning objective: Using independent component analysis (ICA) to optimize A: maximize coding efﬁciency maximize P(x|A) over A. ∂ ∆A ∝ AA log P (x|A) T ∂A = −A(zsT − I) Probability of the pattern ensemble is: where z = (log P(s))’. P (x1 , x2 , ..., xN |A) = P (xk |A) k This learning rule: To obtain P(x|A) marginalize over s: • learns the features that capture the most structure P (x|A) = ds P (x|A, s)P (s) • optimizes the efﬁciency of the code P (s) = | det A| What should we use for P(s)?NIPS 2007 Tutorial 26 Michael S. Lewicki ! Carnegie Mellon
27.
Modeling Non-Gaussian distributions • Typical coeff. distributions of natural signals are non-Gaussian. !=2 !=4 !6 !4 !2 0 2 4 6 !6 !4 !2 0 2 4 6 s s 1 2NIPS 2007 Tutorial 27 Michael S. Lewicki ! Carnegie Mellon
28.
The generalized Gaussian distribution 1 q P (x|q) ∝ exp(− |x| ) 2 • Or equivalently, and exponential power distribution (Box and Tiao, 1973): 2/(1+β) ω(β) x−µ P (x|µ, σ, β) = exp −c(β) σ σ • " varies monotonically with the kurtosis, #2: !=!0.75 " =!1.08 !=!0.25 " =!0.45 !=+0.00 (Normal) " =+0.00 2 2 2 !=+0.50 (ICA tanh) "2=+1.21 !=+1.00 (Laplacian) "2=+3.00 !=+2.00 "2=+9.26NIPS 2007 Tutorial 28 Michael S. Lewicki ! Carnegie Mellon
29.
Modeling Gaussian distributions with PCA • !=0 !=0 Principal component analysis (PCA) describes the principal axes of variation in the data distribution. • This is equivalent to ﬁtting the data with a multivariate Gaussian. !6 !4 !2 0 2 4 6 !6 !4 !2 0 2 4 6 s1 s2NIPS 2007 Tutorial 29 Michael S. Lewicki ! Carnegie Mellon
30.
Modeling non-Gaussian distributions • !=2 !=4 What about non-Gaussian marginals? !6 !4 !2 0 2 4 6 !6 !4 !2 0 2 4 6 s1 s2 • How would this distribution be modeled by PCA?NIPS 2007 Tutorial 30 Michael S. Lewicki ! Carnegie Mellon
31.
Modeling non-Gaussian distributions • !=2 !=4 What about non-Gaussian marginals? !6 !4 !2 0 2 4 6 !6 !4 !2 0 2 4 6 s1 s2 • How would this distribution be modeled by PCA? • How should the distribution be described? The non-orthogonal ICA solution captures the non- Gaussian structureNIPS 2007 Tutorial 31 Michael S. Lewicki ! Carnegie Mellon
32.
Efﬁcient coding of natural images: Olshausen and Field, 1996 naturalscene nature scene visual input visual input units image basis functions receptive fields ... ... ... before after learning learningNetwork weights are adapted to maximize coding efﬁciency:minimizes redundancy and maximizes the independence of the outputsNIPS 2007 Tutorial 32 Michael S. Lewicki ! Carnegie Mellon
33.
Model predicts local and global receptive ﬁeld properties Learned basis for natural images Overlaid basis function properties from Lewicki and Olshausen, 1999NIPS 2007 Tutorial 33 Michael S. Lewicki ! Carnegie Mellon
34.
Algorithm selects best of many possible sensory codes Haar Learned Wavelet Gabor Fourier PCA Theoretical perspective: Not edge “detectors.” from Lewicki and Olshausen, 1999 An efﬁcient code for natural images.NIPS 2007 Tutorial 34 Michael S. Lewicki ! Carnegie Mellon
35.
Comparing coding efﬁciency on natural images Comparing codes on natural images 8 7 6 estimated bits per pixel 5 4 3 2 1 0 Gabor wavelet Haar Fourier PCA learnedNIPS 2007 Tutorial 35 Michael S. Lewicki ! Carnegie Mellon
36.
Responses in primary visual cortex to visual motion from Wandell, 1995NIPS 2007 Tutorial 36 Michael S. Lewicki ! Carnegie Mellon
37.
Sparse coding of time-varying images (Olshausen, 2002) "#$!% !( ! "#$&))!!!(% & ! *$&))!% ... & ! I(x, y, t) = ai (t )φi (x, y, t − t ) + (x, y, t) i t = ai (t) ∗ φi (x, y, t) + (x, y, t) iNIPS 2007 Tutorial 37 Michael S. Lewicki ! Carnegie Mellon
39.
Learned spatio-temporal basis functions timebasisfunction from Olshausen, 2002NIPS 2007 Tutorial 39 Michael S. Lewicki ! Carnegie Mellon
40.
Animated spatial-temporal basis functions from Olshausen, 2002NIPS 2007 Tutorial 40 Michael S. Lewicki ! Carnegie Mellon
41.
$ explains data from principles Theory $ requires idealization and abstraction $ code signals accurately and efﬁciently Principle $ adapted to natural sensory environment Idealization $ cell response is linear Methodology $ information theory, natural images $ explains individual receptive ﬁelds Prediction $ explains population organizationNIPS 2007 Tutorial 41 Michael S. Lewicki ! Carnegie Mellon
42.
How general is the efﬁcient coding principle? Can it explain auditory coding?NIPS 2007 Tutorial 42 Michael S. Lewicki ! Carnegie Mellon
43.
Limitations of the linear model • linear • only optimal for block, not whole signal • no phase-locking • representation depends on block alignmentNIPS 2007 Tutorial Michael S. Lewicki ! Carnegie Mellon
44.
A continuous ﬁlterbank does not form an efﬁcient code ⊗ ← → → → Goal: ﬁnd a representation that is both time-relative and efﬁcientNIPS 2007 Tutorial 44 Michael S. Lewicki ! Carnegie Mellon
45.
Efﬁcient signal representation using time-shiftable kernels (spikes)Smith and Lewicki (2005) Neural Comp. 17:19-45 M nm x(t) = sm,i φm (t − τm,i ) + (t) m=1 i=1 • Each spike encodes the precise time and magnitude of an acoustic feature • Figure 3: Smith, NC ms 2956 Two important theoretical abstractions for “spikes” - not binary - not probabilistic • Can convert to a population of stochastic, binary spikesNIPS 2007 Tutorial 45 Michael S. Lewicki ! Carnegie Mellon
46.
Coding audio signals with spikes 5000Kernel CF (Hz) 2000 1000K 500 200 100 0 5 10 15 20 25 ms Input Reconstruction Residual NIPS 2007 Tutorial 46 Michael S. Lewicki ! Carnegie Mellon
47.
Kernel CF (Hz) 2000Comparing a spike code to a spectrogram 1000 500 a 200 100 b c 5000 5000 FrequencyCF (Hz) 2000 Kernel (Hz) 4000 1000 3000 500 2000 200 a 1000 100 c b 100 200 300 400 500 600 700 800 ms 5000 Frequency (Hz) 4000 Kernel CF (Hz) 2000 3000 1000 2000 500 1000 200 100 100 200 300 400 500 600 700 800 ms c How do we compute the spikes?NIPS 2007 Tutorial 5000 47 Michael S. Lewicki ! Carnegie Mellon
48.
“can” with ﬁlter-threshold 5000Kernel CF (Hz) 2000 1000 500 200 100 0 5 10 15 20 ms Input Reconstruction ResidualNIPS 2007 Tutorial 48 Michael S. Lewicki ! Carnegie Mellon
49.
Spike Coding with Matching Pursuit 1. convolve signal with kernelsNIPS 2007 Tutorial 49 Michael S. Lewicki ! Carnegie Mellon
50.
Spike Coding with Matching Pursuit 1. convolve signal with kernels 2. ﬁnd largest peak over convolution setNIPS 2007 Tutorial 50 Michael S. Lewicki ! Carnegie Mellon
51.
Spike Coding with Matching Pursuit 1. convolve signal with kernels 2. ﬁnd largest peak over convolution set 3. ﬁt signal with kernelNIPS 2007 Tutorial 51 Michael S. Lewicki ! Carnegie Mellon
52.
Spike Coding with Matching Pursuit 1. convolve signal with kernels 2. ﬁnd largest peak over convolution set 3. ﬁt signal with kernel 4. subtract kernel from signal, record spike, and adjust convolutionsNIPS 2007 Tutorial 52 Michael S. Lewicki ! Carnegie Mellon
53.
Spike Coding with Matching Pursuit 1. convolve signal with kernels 2. ﬁnd largest peak over convolution set 3. ﬁt signal with kernel 4. subtract kernel from signal, record spike, and adjust convolutions 5. repeatNIPS 2007 Tutorial 53 Michael S. Lewicki ! Carnegie Mellon
54.
Spike Coding with Matching Pursuit 1. convolve signal with kernels 2. ﬁnd largest peak over convolution set 3. ﬁt signal with kernel 4. subtract kernel from signal, record spike, and adjust convolutions 5. repeatNIPS 2007 Tutorial 54 Michael S. Lewicki ! Carnegie Mellon
55.
Spike Coding with Matching Pursuit 1. convolve signal with kernels 2. ﬁnd largest peak over convolution set 3. ﬁt signal with kernel 4. subtract kernel from signal, record spike, and adjust convolutions 5. repeatNIPS 2007 Tutorial 55 Michael S. Lewicki ! Carnegie Mellon
56.
Spike Coding with Matching Pursuit 1. convolve signal with kernels 2. ﬁnd largest peak over convolution set 3. ﬁt signal with kernel 4. subtract kernel from signal, record spike, and adjust convolutions 5. repeatNIPS 2007 Tutorial 56 Michael S. Lewicki ! Carnegie Mellon
57.
Spike Coding with Matching Pursuit 1. convolve signal with kernels 2. ﬁnd largest peak over convolution set 3. ﬁt signal with kernel 4. subtract kernel from signal, record spike, and adjust convolutions 5. repeat . . .NIPS 2007 Tutorial 57 Michael S. Lewicki ! Carnegie Mellon
58.
Spike Coding with Matching Pursuit 1. convolve signal with kernels 2. ﬁnd largest peak over convolution set 3. ﬁt signal with kernel 4. subtract kernel from signal, record spike, and adjust convolutions 5. repeat . . . 6. halt when desired ﬁdelity is reachedNIPS 2007 Tutorial 58 Michael S. Lewicki ! Carnegie Mellon
59.
“can” 5 dB SNR, 36 spikes, 145 sp/sec 5000 Kernel CF (Hz) 2000 1000 500 200 100 0 50 100 150 200 ms Input Reconstruction ResidualNIPS 2007 Tutorial 59 Michael S. Lewicki ! Carnegie Mellon
60.
“can” 10 dB SNR, 93 spikes, 379 sp/sec 5000 Kernel CF (Hz) 2000 1000 500 200 100 0 50 100 150 200 ms Input Reconstruction ResidualNIPS 2007 Tutorial 60 Michael S. Lewicki ! Carnegie Mellon
61.
“can” 20 dB SNR, 391 spikes, 1700 sp/sec 5000 Kernel CF (Hz) 2000 1000 500 200 100 0 50 100 150 200 ms Input Reconstruction ResidualNIPS 2007 Tutorial 61 Michael S. Lewicki ! Carnegie Mellon
62.
“can” 40 dB SNR, 1285 spikes, 5238 sp/sec 5000 Kernel CF (Hz) 2000 1000 500 200 100 0 50 100 150 200 ms Input Reconstruction ResidualNIPS 2007 Tutorial 62 Michael S. Lewicki ! Carnegie Mellon
63.
Efﬁcient auditory coding with optimized kernel shapesSmith and Lewicki (2006) Nature 439:978-982 What are the optimal kernel shapes? M nm x(t) = sm,i φm (t − τm,i ) + (t) m=1 i=1 Adapt algorithm of Olshausen (2002)NIPS 2007 Tutorial 63 Michael S. Lewicki ! Carnegie Mellon
64.
Optimizing the probabilistic model M nm x(t) = sm φm(t − τim) + (t), i m=1 i=1 p(x|Φ) = p(x|Φ, s, τ )p(s)p(τ )dsdτ ≈ p(x|Φ, s, τ )p(ˆ)p(ˆ) ˆ ˆ s τ (t) ∼ N (0, σ )Learning (Olshausen, 2002): ∂ ∂ log p(x|Φ) = log p(x|Φ, s, τ ) + log p(ˆ)p(ˆ) ˆ ˆ s τ ∂φm ∂φm M n 1 ∂ m = [x − sm φm(t − τim)]2 ˆi 2σε ∂φm m=1 i=1 1 = [x − x] ˆ ˆi sm σε i Also adapt kernel lengthsNIPS 2007 Tutorial 64 Michael S. Lewicki ! Carnegie Mellon
65.
Adapting the optimal kernel shapesNIPS 2007 Tutorial 65 Michael S. Lewicki ! Carnegie Mellon
66.
Kernel functions optimized for coding speechNIPS 2007 Tutorial 66 Michael S. Lewicki ! Carnegie Mellon
67.
Quantifying coding efﬁciency 5000 1. ﬁt signal Kernel CF (Hz) 2000 2. quantize time and amplitude values 1000 spike code 3. prune zero values 4. measure coding efﬁciency using the 500 entropy of quantized values 200 5. reconstruct signal using quantized 100 0 50 150 values 6. measure ﬁdelity using Input signal-to-noise ratio (SNR) of residual error original • identical procedure for other codes (e.g. Fourier and wavelet) Reconstruction reconstruction M nm x(t) = sm,i φm (tResidual) + (t) − τm,i m=1 i=1 residual errorNIPS 2007 Tutorial Michael S. Lewicki ! Carnegie Mellon 67
69.
Using efﬁcient coding theoryComparing atheoretical a spectrogram to make spike code to predictions a b 5000 Natural Sound Environment Kernel CF (Hz) 2000 evolution theory 1000 500 A simple model of auditory coding 200 physiological data: 100 optimal kernels: How do we compute the spikes? sound • auditory nerve non-linearities waveform filterbank ﬁlter shapes static c stochastic spiking population spike code • properties • population trends 5000 • coding efﬁciency Frequency (Hz) 4000 auditory revcor ﬁlters: gammatones 5 3000 Speech Prediction ? Auditory Nerve Filters 1 2 3 4 5 6 7 8 9 10 2000 time (ms) 2 1000 Bandwidth (kHz) 1 More theoretical questions: • Why gammatones? 1 2 3 4 5 time (ms) 0.5 100 200 300 only compare 500 the data after optimizing 400 to 600 700 800 ms • Why spikes? Michael S. Lewicki ! Carnegie Mellon • How is sound coded by we do not ﬁt theBad Zwischenahn ! Aug 21, 2004 data 0.2 1 2 3 4 5 the spike population? time (ms) 0.1 0.1 0.2 0.5 1 2 5 Center Frequency (kHz) How do we develop a theory? Prediction depends on sound ensemble. deBoer and deJongh, 1978 Carney and Yin, 1988 Michael S. Lewicki ! Carnegie Mellon Bad Zwischenahn ! Aug 21, 2004NIPS 2007 Tutorial 69 Michael S. Lewicki ! Carnegie Mellon
70.
Learned kernels share features of auditory nerve ﬁlters Auditory nerve ﬁlters Optimized kernels from Carney, McDuffy, and Shekhter, 1999 scale bar = 1 msecNIPS 2007 Tutorial 70 Michael S. Lewicki ! Carnegie Mellon
71.
Learned kernels closely match individual auditory nerve ﬁlters for each kernel ﬁnd closet matching auditory nerve ﬁlter in Laurel Carney’s database of ~100 ﬁlters.NIPS 2007 Tutorial 71 Michael S. Lewicki ! Carnegie Mellon
72.
Learned kernels overlaid on selected auditory nerve ﬁlters For almost all learned kernels there is a closely matching auditory nerve ﬁlter.NIPS 2007 Tutorial 72 Michael S. Lewicki ! Carnegie Mellon
74.
How is this achieving an efﬁcient, time-relative code? Time-relative coding of glottal pulsesab 5000Kernel CF (Hz) 2000 1000 500 200 100 0 20 40 60 80 100 ms NIPS 2007 Tutorial 74 Michael S. Lewicki ! Carnegie Mellon
75.
$ explains data from principles Theory $ requires idealization and abstraction $ code signals accurately and efﬁciently Principle $ adapted to natural sensory environment Idealization $ analog spikes $ information theory, natural sounds Methodology $ optimization Prediction $ explains individual receptive ﬁelds $ explains population organizationNIPS 2007 Tutorial 75 Michael S. Lewicki ! Carnegie Mellon
76.
A gap in the theory? - - + - - Learned from Hubel, 1995NIPS 2007 Tutorial 76 Michael S. Lewicki ! Carnegie Mellon
77.
Redundancy reduction for noisy channels (Atick, 1992) s input x recoding Ax output y channel A channel y = x +" y = Ax +! Mutual information P (x, s) I(x, s) = P (x, s) log2 s,x P (s)P (x) I(x, s) = 0 iﬀ P (x, s) = P (x)P (s), i.e. x and s are independent.NIPS 2007 Tutorial 77 Michael S. Lewicki ! Carnegie Mellon
78.
Proﬁles of optimal ﬁlters • high SNR - - reduce redundancy - + - - - center-surround - - + - - + • low SNR - average - low-pass ﬁlter • matches behavior of retinal ganglion cellsNIPS 2007 Tutorial 78 Michael S. Lewicki ! Carnegie Mellon
79.
An observation: Contrast sensitivity of ganglion cells Luminance level decreases one log unit each time we go to lower curve.NIPS 2007 Tutorial 79 Michael S. Lewicki ! Carnegie Mellon
80.
Natural images have a 1/f amplitude spectrum Field, 1987NIPS 2007 Tutorial 80 Michael S. Lewicki ! Carnegie Mellon
81.
Components of predicted ﬁlters whitening ﬁlter low-pass ﬁlter optimal ﬁlter from Atick, 1992NIPS 2007 Tutorial 81 Michael S. Lewicki ! Carnegie Mellon
82.
Predicted contrast sensitivity functions match neural data from Atick, 1992NIPS 2007 Tutorial 82 Michael S. Lewicki ! Carnegie Mellon
83.
Robust coding of natural images • Theory reﬁned: - image is noisy and blurred - neural population size changes - neurons are noisy from Hubel, 1995(a) Undistorted image (b) Fovea retinal image (c) 40 degrees eccentricity Intensity -4 0 4 -4 0 4 Visual angle [arc min] from Doi and Lewicki, 2006NIPS 2007 Tutorial 83 Michael S. Lewicki ! Carnegie Mellon
84.
Problem 1: Real neurons are “noisy” Estimates of neural information capacity system (area) stimulus bits / sec bits / spike ﬂy visual (H1) motion 64 ~1 monkey visual (MT) motion 5.5 - 12 0.6 - 1.5 frog auditory (auditory nerve) noise & call 46 & 133 1.4 & 7.8 Salamander visual (ganglinon cells) rand. spots 3.2 1.6 cricket cercal (sensory afferent) mech. motion 294 3.2 cricket cercal (sensory afferent) wind noise 75 - 220 0.6 - 3.1 cricket cercal (10-2 and 10-3) wind noise 8 - 80 avg. = 1 Electric ﬁsh (P-afferent) amp. modulation 0 - 200 0 - 1.2 Limited capacity neural codes needBorst and Theunissen, 1999 After to be robust.NIPS 2007 Tutorial 84 Michael S. Lewicki ! Carnegie Mellon
85.
Traditional codes are not robust encoding neurons sensory input OriginalNIPS 2007 Tutorial 85 Michael S. Lewicki ! Carnegie Mellon
86.
Traditional codes are not robust encoding neurons Add noise equivalent to 1 bit precision sensory input Original 1x efﬁcient coding reconstruction (34% error) 1 bit precisionNIPS 2007 Tutorial 86 Michael S. Lewicki ! Carnegie Mellon
87.
al., to pairwise correla- al., st 2001; Schneidman ete correlations that can not just to pairwise correla- Redundancy plays an important role in neural coding ental Procedures). By can to all correlations thatxperimental Procedures). Bycy can be measuredmodel ofcan be measured undancy the light re-g any model of the light re- ancy was defined as edundancy was defined as Response of salamander retinal ganglion cells to natural moviesal information that that the mutual information thene conveyed aboutstim-veyed about the the stim- information conveyed d the information conveyed a As information rates).,Rb;S). As information ratesn of ganglion cells, wepulation of ganglion cells, we as a fraction of the minimumaction of the minimumal cells cells (Reichal., al.,dividual (Reich et et min{I(Ra;S), I(Rb;S)}, is the ancy I(Rb;S)}, is the Ra;S),between two cells, sobetween be nocells, sothan ncy can two greater fractional redundancyan be 0 when the two cells cy is no greater than 0 when the two cellsmation about the stimulus; its lls encode exactly the same about the stimulus; its ell’s information is a subset ode exactly the same ues of the redundancy mean formation is a subset c. the redundancy mean s of Ganglion Cells rocessing under realistic vi- anglion Cells ted retinas with a set of nat-sing underarealistic of envi- represent variety vi- From Puchalla et al, 2005 cell pair distance (%m) important a set of nat- of inas with characteristicsent a NIPS 2007 Tutorial by the variety of envi- field motion caused 87 Michael S. Lewicki ! Carnegie Mellon
88.
tance, the code. great numbersee, coding elementsintroduces redundancy into the cod ty of what if a As we will of robust coding are available while their coding pre o be Robust codingthe assumed noise in 2005,2006) case, separated from (Doi and Lewicki, the representation, unlike PCA or ICA the the apparent dimensionality could be increased (instead of reduced) while tha limited. Robust coding should make optimal use of such an arbitrary number of cod e present the optimal linear encoder andcoding introduces redundancy into the code lity of the code. As we will Encoder see, robust decoder when the coefﬁcients are noisy, an Decoder to be separated from the assumedthe different representation, unlike PCA or ICApaper r to the number of units and to noise in the data and noise conditions. This that x W u A ^ x , we formulate the problem. Then, in section III, we analyze the solutions for the gwe present theand two-dimensional case. Next, in when theIV, we applyare noisy, andons for one- optimal Data encoder and decoder section coefﬁcients the proposed linear Noiseless Reconstruction Representation y to theits considerable robustness different data and noise conditions. This codes. is nstrate number of units and to the compared against conventional image paper Fiudies. formulate the problem. Then, in section III, we analyze the solutions for the genII, we ions for one- and two-dimensional case. Next, in section IV, we apply the proposed ro Model limited precision usingonstrate its channel noise: robustnessPcompared FORMULATION additive considerable II. Channel Noise against conventional image codes. Fina ROBLEM n udies. (Fig. 1), we assume that the data is N -dimensional with zero mean and covmodel Encoder Decoderrices W ∈ RM ×N and A ∈ RN ×M . For each data point x, its representation r in r FORMULATION A ^ x perturbed byII.uP ROBLEM noise (i.e., channel noise) n ∼ N (0, σ 2 W xrough matrix W, the additive n Data Noiseless Noisy Reconstruction model (Fig. 1), we assume that the data is Representation Representation N -dimensional with zero mean and covaritrices W ∈ RM ×N and A ∈ r N ×M . For each= u + n. x, its representation r in th R = Wx + n data point channel noise) n ∼ N (0, σ 2 Ihrough matrix W, perturbed by the additive noise (i.e., vectors. The reconstructionnofM the encoding matrix and its row vectors as encodingnoisy representation using matrix A: + n = u + n. r = Wx Caveat: linear, but can evaluate for differents the encoding matrix and its row= Ar = encoding vectors. x vectors as AWx + An. ˆ Thenoise levels. reconstruction of adnoisy representation using matrix A: NIPS 2007 Tutorial 88 Michael The term AWx i the decoding matrix and its column vectors as decoding vectors. S. Lewicki ! Carnegie Mellon
89.
Generalizing the model: sensory noise and optical blur(a) Undistorted image (b) Fovea retinal image (c) 40 degrees eccentricity Intensity -4 0 4 -4 0 4 Visual angle [arc min] sensory noise channel noise ν δ only implicit optical blur encoder decoder s H x W r A s ˆ image observation representation reconstruction Can also add sparseness and resource constraintsNIPS 2007 Tutorial 89 Michael S. Lewicki ! Carnegie Mellon
90.
REVIEWSRobust coding is distinct from “reading” noisy populations a b In EQN 2, si is the direction (th 100 100 triggers the strongest respo width of the tuning curve, s 80 80 (so if s = 359° and si = 2°, the Activity (spikes s–1) Activity (spikes s–1) 60 60 ing factor. In this case, all t share a common tuning cur 40 40 preferred directions, si (FIG. 1 involve bell-shaped tuning cu 20 20 The inclusion of the no because neurons are known 0 0 –100 0 100 –100 0 100 neuron that has an average Direction (deg) Preferred direction (deg) stimulus moving at 90° mig c d occasion for a particular 90° 100 100 another occasion for exactly factors contribute to this va 80 80 trolled or uncontrollable as Activity (spikes s–1) Activity (spikes s–1) presented to the monkey, a 60 60 neuronal responses. In the collectively considered as ran 40 40 this noise causes important 20 20 transmission and processing which are solved by populati 0 0 we should be concerned no –100 0 100 –100 0 100 computes with population co Preferred direction (deg) Preferred direction (deg) From Pouget et al, 2001 reliably in the presence of suc Figure 1 | The standard population coding model. a | Bell-shaped tuning curves to direction for 16 neurons. b | A population pattern of activity across 64 neurons with bell-shaped tuning curves in Decoding population cod response to an object moving at –40°. The activity of each cell was generated using EQN 1, and In this section, we shall addr Here, we want to learn an optimal image code using noisy neurons. plotted at the location of the preferred direction of the cell. The overall activity looks like a ‘noisy’ hill what information about t centred around the stimulus direction. c | Population vector decoding fits a cosine function to the object is available from the r observed activity, and uses the peak of the cosine function, s as an estimate of the encoded ˆ, direction. d | Maximum likelihood fits a template derived from the tuning curves of the cells. More neurons? Let us take a hypothNIPS 2007 Tutorial precisely, the template is obtained from the noiseless (or average) population activity in response to S. Lewicki we record Mellon 90 Michael that ! Carnegie the activity of and that these neurons have
91.
∆W ∝ − (Cost) = 2 A (IN − AW)Σx − λ diag (Cost) = (Error) + λ (Var.∂W Const.) M (83) How do we learn robust codes? M sensory noise 2 channel noise u2 A = Σx WT (σn IM + WΣx WT 2 (Var. Const.) = ln ν 2 i δ (84) i=1 σu optical blur encoder decoder ∂ ⇔ (Cost) = O s H x W r A ∂A s ˆ image observation representation reconstruction 4 ln{diag(WΣx WT )/c} u22 A (IObjective: x − λ T N − AW)Σ diag T) WΣxi = i SNR (85) M diag(WΣx W 2 σn ! Find W and A that minimize reconstruction error. A •= Channel capacity of+ WΣneuron:−1 Σx WT (σn IM the ith x WT ) 2 u 2 = σu i 2 (86) 1 Ci = ln(SNRi + 1) ∂ ⇔ (Cost) = O 2 (87) ∂A • To limit capacity, ﬁx the coefﬁcient signal to noise ratio:√ λ 1 + λ2 √ √ √λ1 u2 2(1 + M SNR) λ2 SNRi = 2i 2 (88) σn Now robust coding is formulated as a 2 constrained optimization problem. N 1 1 u 2 = σu 2 E= M λi i N · SNR + 1 N i=1 (89) NIPS 2007 Tutorial 91 Michael S. Lewicki ! Carnegie Mellon
92.
Robust coding of natural images encoding neurons Add noise equivalent to 1 bit precision sensory input Original 1x efﬁcient coding reconstruction (34% error) 1 bit precisionNIPS 2007 Tutorial 92 Michael S. Lewicki ! Carnegie Mellon
93.
Robust coding of natural images encoding neurons Weights adapted for optimal robustness sensory input Original 1x robust coding reconstruction (3.8% error) 1 bit precisionNIPS 2007 Tutorial 93 Michael S. Lewicki ! Carnegie Mellon
94.
Reconstruction improves by adding neurons encoding neurons Weights adapted for optimal robustness sensory input Original 8x robust coding reconstruction error: 0.6% 1 bit precisionNIPS 2007 Tutorial 94 Michael S. Lewicki ! Carnegie Mellon
95.
Adding precision to 1x robust coding original images 1 bit: 12.5% error 2 bit: 3.1% errorNIPS 2007 Tutorial 95 Michael S. Lewicki ! Carnegie Mellon
96.
1 M ·2 (1, ·√ ,1 T ∈ RM . It takes a convenient fo where 1M SNR + 1· · 1) = Can 2(1 + M SNR) 2 average error bound derive minimum theoretical 2 λ2 2-D 2σx Isotropic E= M diag(V 2 · SNR + 1 when we deﬁne V (byλ1 + λ2 ) √ 22 N a linear transform of W, √ 2-D 1= 11 E = M E M · SNR + 1 Anisotropic λi if SNR ≥ SNRc N · SNR + 1 N i=1 2 2 V≡ λ1 E= = EDET √ where Σx· SNR + + λ2 the is if SNR ≤ SNRc eigenvalue decomposition of M 1 √ λ1 λi -≡ Dii are theof the data covariance. To summarize, the c N ith eigenvalue eigenvalues of Σx its λi 1 thatinputlinear transform. V should have row vectors of un E = MN- i=1 dimensionality . . · SNR + 1usN (neurons) N M Next, coding units √ a necessary condition for the min - # of let consider λN all A PPLICATION Regarding A, ∂E/∂A = O yields (se IV. free parameters.TO IMAGE CODINGata. we characterized the optimal solutions for 1-D and 2-D data. For1the high ection A = high-dim γ 2 ES(Iata, or theto be investigated. Here, we data. lower bound solutions for us remains eigenvalues of isotoropictheoretical numerical Algorithm achieves present σbustness to channel noise. To derive an optimal solution we can employ a gradons) (Note that Results Bound the necessary condition with respect to W (or efunction (its details are given in [3], [4]). W (or V) 19.9% is constrained to a certain subspace by the chaperformance of our proposed code when applied 20.3%test image. The data consists 0.5x to a sampled T ) = 1x 512×512 pixel image. We set the number andcodin Using the channel T 12.5% constraint (eq. 9) of the ne randomlydiag( uufrom the diag(WΣx 12.4% W capacity σ 2 1 ) = u M be8xsimpliﬁed as (see Appendix B) Balcan, Doi, and Lewicki, 2007; 2.0% 2.0% Balcan and Lewicki, 2007 NIPS 2007 Tutorial 96 E = tr{D · ( Michael S. Lewicki ! Carnegie Mellon
97.
Sparseness localizes the vectors and increases coding efﬁciency normalized histogram encoding vectors (W) decoding vectors (A) of coeff. kurtosis Kurtosis of RC 100 Kurtosis of RC 100 avg. = 4.9 80 80robust coding % % frequency 60 frequency 60 % 40 40 20 20 0 0 4 6 8 10 4 6 kurtosis 8 10 average error is unchanged kurtosis kurtosis 12.5% & 12.5% Kurtosis of RSC 100 Kurtosis of RSC 100 avg. = 7.3robust sparse coding 80 80 % % frequency 60 frequency 60 %40 40 20 20 0 0 4 6 8 10 4 6 kurtosis 8 10 kurtosis kurtosis NIPS 2007 Tutorial 97 Michael S. Lewicki ! Carnegie Mellon
98.
Non-zero resource constraints localize weights (a) 20dB 10dB 0dB -10dB fovea (d) 20dB (b) Magnitude -10dB (c) Spatial freq. average error is also unchanged 40 degrees20dB -10dBNIPS 2007 Tutorial 98 Michael S. Lewicki ! Carnegie Mellon
99.
$ explains data from principles Theory $ requires idealization and abstraction $ code signals accurately, efﬁciently, robustly Principle $ adapted to natural sensory environment $ cell response is linear, noise is additive Idealization $ simple, additive resource constraints $ information theory, natural images Methodology $ constrained optimization $ explains individual receptive ﬁelds Prediction $ explains non-linear response $ explains population organizationNIPS 2007 Tutorial 99 Michael S. Lewicki ! Carnegie Mellon
100.
What happens next? - - + - - Learned from Hubel, 1995NIPS 2007 Tutorial 100 Michael S. Lewicki ! Carnegie Mellon
101.
What happens next?NIPS 2007 Tutorial 101 Michael S. Lewicki ! Carnegie Mellon
102.
of filters with strongly matic gain control known as ‘divisive normalization’ that has beenis larger; for a pair that used to account for many nonlinear steady-state behaviors of neu- Joint statisticsrons in primary visual cortex10,20,21. Normalization models havertion is zero. Because L2 of ﬁlter outputs show magnitude dependenceber of other filters with- been motivated by several basic properties. First, gain control ralization of this condi- portional to a weighted neighborhood and an histo{ L | L ! 0.1 } histo{ L2 | L1 ! 0.9 } 2 1optimal weights and an 1 1ihood of the conditionales or sounds (Methods, 0.6 0.6er for pairs of filters that 0.2 0.2ge as seen through two lin- ical filter (L2), conditioned -1 0 1 -1 0 1diagonal spatially shifted fil-r all image positions, and a 1he frequency of occurencensional histograms are ver-widths of these histogramse not statistically indepen- L2 0 histo{ L2| L1 } ull two-dimensional condi- tional to the bin counts,escaled to fill the range ofhly decorrelated (expected -1 of L1) but not statistically -1 0 1 stribution of L2 increases Ltive) of L1. 1 nature neuroscience • volume 4 no 8 • Schwartz and Simoncelli (2001) from august 2001 NIPS 2007 Tutorial 102 Michael S. Lewicki ! Carnegie Mellon
103.
Using sensory gain control to reduce redundancySchwartz and Simoncelli (2001) 1 4 Fig. 4. Generic no response is divided A divisive boring filters and a normalization 0 Maximum Likeliho model The conditional his that the variance o 0 -1 -1 0 1 0 4 gram is a represen Other squared a particular mecha filter responses 2 ! F1 that is primarily e In Fig. 7b, the hig Stimulus quency bands wi F2 2 Adapt weights to DISCUSSION ! Other squared We have describ minimize higher- filter responses order dependencies processing, in w from natural images divided by a gain the squared lineaNIPS 2007 Tutorial 103 Michael S. Lewicki ! Carnegie Mellonmasking tone. As in the visual data, the rate–level curves of the stant. The form
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.
Be the first to comment