Visual Parsing after Recovery from Blindness
Upcoming SlideShare
Loading in...5
×
 

Visual Parsing after Recovery from Blindness

on

  • 1,044 views

Brain CS Reading Group

Brain CS Reading Group
braincs.wiki.zoho.com

Lunch Talk
11 Mar 2010

Statistics

Views

Total Views
1,044
Views on SlideShare
1,044
Embed Views
0

Actions

Likes
0
Downloads
3
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Visual Parsing after Recovery from Blindness Visual Parsing after Recovery from Blindness Presentation Transcript

  • Overview Tests of Visual Parsing Conclusion Visual Parsing After Recovery from Blindness Ostrovsky et al. Psychological Science 2009. Daniel O’Shea djoshea@stanford.edu Brain CS Lunch Talk 11 March 2010 Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 1 / 19
  • Overview Tests of Visual Parsing Conclusion Overview Project Prakash Learning to Recognize Objects Subjects Tests of Visual Parsing Static Visual Parsing Dynamic Visual Parsing Recovery of Static Object Recognition Conclusion Motion Information as a Scaffold Experimental and Theoretical Evidence for Temporal Learning Future Directions Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 2 / 19
  • Overview Project Prakash Tests of Visual Parsing Learning to Recognize Objects Conclusion Subjects Project Prakash “The goal of Project Prakash is to bring light into the lives of curably blind children and, in so doing, illuminate some of the most fundamental scientific questions about how the brain develops and learns to see.” http://web.mit.edu/bcs/sinha/prakash.html Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 3 / 19
  • Overview Project Prakash Tests of Visual Parsing Learning to Recognize Objects Conclusion Subjects Project Prakash Launched in 2003 by Pawan Sinha at MIT BCS Based in New Delhi; India home to 30% of blind population Humanitarian aim: Screen children for treatable blindness and provide modern treatment Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 4 / 19
  • Overview Project Prakash Tests of Visual Parsing Learning to Recognize Objects Conclusion Subjects Project Prakash Launched in 2003 by Pawan Sinha at MIT BCS Based in New Delhi; India home to 30% of blind population Humanitarian aim: Screen children for treatable blindness and provide modern treatment Scientific Opportunity How does the visual system learn to perceive objects? Track object learning in longitudinal studies Complementary to infant studies which have more plastic visual processing but are limited in experimental complexity Separation of visual learning from sensory development Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 4 / 19
  • Overview Project Prakash Tests of Visual Parsing Learning to Recognize Objects Conclusion Subjects Learning to Recognize Visual system must learn to “parse” visual world into meaningful entities Segmentation focused on heuristics: alignment of contours, Y. Ostrovsky et al. similarity of texture representations Fig. 1. Example illustrating how a natural image (a) is typically a collection of many regions of different hues and luminances (b). The human visual system has to accomplish the task of integrating subsets of these regions into coherent objects, illustrated in (c). Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 5 / 19
  • Overview Project Prakash Tests of Visual Parsing Learning to Recognize Objects Conclusion Subjects Central questions Critical period for learning visual parsing? Evidence that mature visual system uses these heuristics for object recognition, but are they useful in organizing visual information during development? Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 6 / 19
  • Overview Project Prakash Tests of Visual Parsing Learning to Recognize Objects Conclusion Subjects Recovery Subjects S.K. 29 yo, congenital bilateral aphakia (absence of lens); corrected with eyeglasses! P.B. 7 yo, treated with bilateral cataract surgery with intraocular lens implantation J.A. 13 yo, treated with bilateral cataract surgery with intraocular lens implantation Control Subjects Visual input scaled to simulate information loss due to imperfect acuity in treated group (20/80 to 20/120) Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 7 / 19
  • Overview Static Visual Parsing Tests of Visual Parsing Dynamic Visual Parsing Conclusion Recovery of Static Object Recognition Tests of Static Visual Parsing Y. Ostrovsky et al. a A B C D E F G How many How many How many How many Which object Trace the How many objects? objects? objects? objects? is in front? long curve objects? b 100 Control Group Performance (% correct) S.K. J.A. P.B. 50 NA NA NA NA 0 A B C D E F G c d A: Non-overlapping shapes: No difficulty. Fig. 2. Subjects’ parsing of static images. Seven tasks (a) were used to assess the recently treated subjects’ ability to perform simple image seg- mentation and shape analysis. The graph (b) shows the performance of these subjects relative to the control subjects on these tasks. ‘‘NA’’ indicates that data are not available for a subject. S.K.’s tracing of a pattern drawn by one of the authors (c) illustrates the fragmented percepts of the recently treated subjects. In the upper row of (d), the outlines indicate the regions of real-world images that S.K. saw as distinct objects. He was unable to Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness recognize any of these images. For comparison, the lower row of (d) shows the segmentation of the same images according to a simple algorithm that agglomerated spatially adjacent regions that satisfied a threshold criterion of similarity in their hue and luminance attributes. 8 / 19
  • Overview Static Visual Parsing Tests of Visual Parsing Dynamic Visual Parsing Conclusion Recovery of Static Object Recognition Tests of Static Visual Parsing Y. Ostrovsky et al. a A B C D E F G How many How many How many How many Which object Trace the How many objects? objects? objects? objects? is in front? long curve objects? b 100 Control Group Performance (% correct) S.K. J.A. P.B. 50 NA NA NA NA 0 A B C D E F G c d B: Overlapping shapes as line drawings: overfragmentation, all closed loops perceived as distinct. Fig. 2. Subjects’ parsing of static images. Seven tasks (a) were used to assess the recently treated subjects’ ability to perform simple image seg- mentation and shape analysis. The graph (b) shows the performance of these subjects relative to the control subjects on these tasks. ‘‘NA’’ indicates that data are not available for a subject. S.K.’s tracing of a pattern drawn by one of the authors (c) illustrates the fragmented percepts of the recently treated subjects. In the upper row of (d), the outlines indicate the regions of real-world images that S.K. saw as distinct objects. He was unable to Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness recognize any of these images. For comparison, the lower row of (d) shows the segmentation of the same images according to a simple algorithm that agglomerated spatially adjacent regions that satisfied a threshold criterion of similarity in their hue and luminance attributes. 8 / 19
  • Overview Static Visual Parsing Tests of Visual Parsing Dynamic Visual Parsing Conclusion Recovery of Static Object Recognition Tests of Static Visual Parsing Y. Ostrovsky et al. a A B C D E F G How many How many How many How many Which object Trace the How many objects? objects? objects? objects? is in front? long curve objects? b 100 Control Group Performance (% correct) S.K. J.A. P.B. 50 NA NA NA NA 0 A B C D E F G c d C: Overlapping shapes as filled translucent surfaces: overfragmentation, all regions of constant luminance perceived as distinct Fig. 2. Subjects’ parsing of static images. Seven tasks (a) were used to assess the recently treated subjects’ ability to perform simple image seg- mentation and shape analysis. The graph (b) shows the performance of these subjects relative to the control subjects on these tasks. ‘‘NA’’ indicates that data are not available for a subject. S.K.’s tracing of a pattern drawn by one of the authors (c) illustrates the fragmented percepts of the recently treated subjects. In the upper row of (d), the outlines indicate the regions of real-world images that S.K. saw as distinct objects. He was unable to Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness recognize any of these images. For comparison, the lower row of (d) shows the segmentation of the same images according to a simple algorithm that agglomerated spatially adjacent regions that satisfied a threshold criterion of similarity in their hue and luminance attributes. 8 / 19
  • Overview Static Visual Parsing Tests of Visual Parsing Dynamic Visual Parsing Conclusion Recovery of Static Object Recognition Tests of Static Visual Parsing Y. Ostrovsky et al. a A B C D E F G How many How many How many How many Which object Trace the How many objects? objects? objects? objects? is in front? long curve objects? b 100 Control Group Performance (% correct) S.K. J.A. P.B. 50 NA NA NA NA 0 A B C D E F G c d D: Opaque overlapping shapes: could correctly indicate number of objects. Fig. 2. Subjects’ parsing of static images. Seven tasks (a) were used to assess the recently treated subjects’ ability to perform simple image seg- mentation and shape analysis. The graph (b) shows the performance of these subjects relative to the control subjects on these tasks. ‘‘NA’’ indicates that data are not available for a subject. S.K.’s tracing of a pattern drawn by one of the authors (c) illustrates the fragmented percepts of the recently treated subjects. In the upper row of (d), the outlines indicate the regions of real-world images that S.K. saw as distinct objects. He was unable to Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness recognize any of these images. For comparison, the lower row of (d) shows the segmentation of the same images according to a simple algorithm that agglomerated spatially adjacent regions that satisfied a threshold criterion of similarity in their hue and luminance attributes. 8 / 19
  • Overview Static Visual Parsing Tests of Visual Parsing Dynamic Visual Parsing Conclusion Recovery of Static Object Recognition Tests of Static Visual Parsing Y. Ostrovsky et al. a A B C D E F G How many How many How many How many Which object Trace the How many objects? objects? objects? objects? is in front? long curve objects? b 100 Control Group Performance (% correct) S.K. J.A. P.B. 50 NA NA NA NA 0 A B C D E F G c d E: Opaque overlapping shapes: could not indicate depth ordering (chance performance) Fig. 2. Subjects’ parsing of static images. Seven tasks (a) were used to assess the recently treated subjects’ ability to perform simple image seg- mentation and shape analysis. The graph (b) shows the performance of these subjects relative to the control subjects on these tasks. ‘‘NA’’ indicates that data are not available for a subject. S.K.’s tracing of a pattern drawn by one of the authors (c) illustrates the fragmented percepts of the recently treated subjects. In the upper row of (d), the outlines indicate the regions of real-world images that S.K. saw as distinct objects. He was unable to Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness recognize any of these images. For comparison, the lower row of (d) shows the segmentation of the same images according to a simple algorithm that agglomerated spatially adjacent regions that satisfied a threshold criterion of similarity in their hue and luminance attributes. 8 / 19
  • Overview Static Visual Parsing Tests of Visual Parsing Dynamic Visual Parsing Conclusion Recovery of Static Object Recognition Tests of Static Visual Parsing Y. Ostrovsky et al. a A B C D E F G How many How many How many How many Which object Trace the How many objects? objects? objects? objects? is in front? long curve objects? b 100 Control Group Performance (% correct) S.K. J.A. P.B. 50 NA NA NA NA 0 A B C D E F G c d F: Field of oriented line segments: difficulty finding path of coherent segments Fig. 2. Subjects’ parsing of static images. Seven tasks (a) were used to assess the recently treated subjects’ ability to perform simple image seg- mentation and shape analysis. The graph (b) shows the performance of these subjects relative to the control subjects on these tasks. ‘‘NA’’ indicates that data are not available for a subject. S.K.’s tracing of a pattern drawn by one of the authors (c) illustrates the fragmented percepts of the recently treated subjects. In the upper row of (d), the outlines indicate the regions of real-world images that S.K. saw as distinct objects. He was unable to Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness recognize any of these images. For comparison, the lower row of (d) shows the segmentation of the same images according to a simple algorithm that agglomerated spatially adjacent regions that satisfied a threshold criterion of similarity in their hue and luminance attributes. 8 / 19
  • Overview Static Visual Parsing Tests of Visual Parsing Dynamic Visual Parsing Conclusion Recovery of Static Object Recognition Tests of Static Visual Parsing Y. Ostrovsky et al. a A B C D E F G How many How many How many How many Which object Trace the How many objects? objects? objects? objects? is in front? long curve objects? b 100 Control Group Performance (% correct) S.K. J.A. P.B. 50 NA NA NA NA 0 A B C D E F G c d G: 3D shapes with luminance from lighting/shadows: percept of multiple objects, one for each facet Fig. 2. Subjects’ parsing of static images. Seven tasks (a) were used to assess the recently treated subjects’ ability to perform simple image seg- mentation and shape analysis. The graph (b) shows the performance of these subjects relative to the control subjects on these tasks. ‘‘NA’’ indicates that data are not available for a subject. S.K.’s tracing of a pattern drawn by one of the authors (c) illustrates the fragmented percepts of the recently treated subjects. In the upper row of (d), the outlines indicate the regions of real-world images that S.K. saw as distinct objects. He was unable to Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness recognize any of these images. For comparison, the lower row of (d) shows the segmentation of the same images according to a simple algorithm that agglomerated spatially adjacent regions that satisfied a threshold criterion of similarity in their hue and luminance attributes. 8 / 19
  • Overview Static Visual Parsing Tests of Visual Parsing Dynamic Visual Parsing Conclusion Recovery of Static Object Recognition Tests of Static Visual Parsing Y. Ostrovsky et al. a A B C D E F G How many How many How many How many Which object Trace the How many objects? objects? objects? objects? is in front? long curve objects? b 100 Control Group Performance (% correct) S.K. J.A. P.B. 50 NA NA NA NA 0 A B C D E F G c d Conclusion: Tendency to overfragment driven by low-level cues (hue and luminance regions). Inability to use cues of contour continuation, junction structure, figural symmetry. Fig. 2. Subjects’ parsing of static images. Seven tasks (a) were used to assess the recently treated subjects’ ability to perform simple image seg- mentation and shape analysis. The graph (b) shows the performance of these subjects relative to the control subjects on these tasks. ‘‘NA’’ indicates that data are not available for a subject. S.K.’s tracing of a pattern drawn by one of the authors (c) illustrates the fragmented percepts of the recently treated subjects. In the upper row of (d), the outlines indicate the regions of real-world images that S.K. saw as distinct objects. He was unable to Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness recognize any of these images. For comparison, the lower row of (d) shows the segmentation of the same images according to a simple algorithm that agglomerated spatially adjacent regions that satisfied a threshold criterion of similarity in their hue and luminance attributes. 8 / 19
  • Overview Static Visual Parsing a Tests of Visual Parsing Dynamic Visual Parsing A B C D E F G Conclusion Recovery of Static Object Recognition Object Tracing How many objects? How many objects? How many objects? How many objects? Which object Trace the is in front? long curve How many objects? b 100 Control Group Performance (% correct) Overfragmentated percept evident in object tracing S.K. J.A. Segmentation consistent with algorithm based on 50 P.B. hue/luminance similarity metric NA NA NA NA 0 A B C D E F G c d Fig. 2. Subjects’ parsing of static images. Seven tasks (a) were used to assess the recently treated subjects’ ability to perform simple image seg- mentation and shape analysis. The graph (b) shows the performance of these subjects relative to the control subjects on these tasks. ‘‘NA’’ indicates that data are not available for a subject. S.K.’s tracing of a pattern drawn by one of the authors (c) illustrates the fragmented percepts of the recently treated subjects. In the upper row of (d), the outlines indicate the regions of real-world images that S.K. saw as distinct objects. He was unable to recognize any of these images. For comparison, the lower row of (d) shows the segmentation of the same images according to a simple algorithm that agglomerated spatially adjacent regions that satisfied a threshold criterion of similarity in their hue and luminance attributes. Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 9 / 19
  • Overview Static Visual Parsing Tests of Visual Parsing Dynamic Visual Parsing Conclusion Recovery of Static Object Recognition Dynamic Visual Input Natural visual experience is dynamic, includes motion Show individual shapes undergoing independent smooth translational motion and study percept in a dynamic context Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 10 / 19
  • indicate correct recognition responses. Overview Static Visual Parsing So far, we have described the recently Visualsubjects’ per- Tests of treated Parsing allowed S.K. to better Parsing shapes embedded in noise Dynamic Visual perceive formance with static imagery. In order to makeConclusion our experiments (Fig. 4, Tests C and Static Object Recognition Recovery of D). Motion thus appeared to be instrumental more representative of everyday visual experience, which typ- for enabling the recently treated subjects to link together parts ically involves dynamic inputs, we used a set of stimuli that of an object and segregate them from the background. Dynamic Visual Input incorporated motion cues (Fig. 4, Tests A and B). The task was the same as for the tests of static visual parsing—to indicate the The recently treated subjects’ recognition results with real- world images, already summarized, provide evidence of another number of objects shown. The individual shapes underwent role that motion might play in their object perception skills. independent smooth translational motion. For overlapping fig- When we examined which images the recently treated subjects ures, the movement was constrained such that the overlap was were able to recognize, an interesting pattern became evident. Motion facilitates object binding (linking together the parts) maintained at all times. As illustrated in Figure 3, the recently treated subjects’ recog- The inclusion of motion brought about a dramatic change in nition responses showed a significant congruence with inde- and segregation from background in the presence of noise the recently treated subjects’ responses. As Figure 4 indicates, pendently derived motility ratings of the objects depicted in the they responded correctly on a majority of the trials. Motion also images (p < .01 in a chi-square test for each of the 3 subjects). A 100 Control Group Performance (% correct) S.K. J.A. P.B. 50 0 NA NA NA NA A B C D How many How many Name the Name the objects? objects? object object Fig. 4. Recently treated and control subjects’ performance on tasks designed to assess the role of dynamic information in object segregation. ‘‘NA’’ indicates that data are not available for a subject. In the illustrations of the four tasks, the arrows indicate direction of movement. Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 11 / 19
  • Overview Static Visual Parsing Tests of Visual Parsing Dynamic Visual Parsing Conclusion Recovery of Static Object Recognition Static Recognition and Object Motility More “motile” objects more likely to be recognized Object motion (alongside low-level cues) binds regions into a cohesive whole; learning of this cohesive structure facilitates recognition in a static context Visual Parsing After Recovery From Blindness Fig. 3. Motility ratings of the 50 images used to test object recognition and the recently treated subjects’ ability to recognize these images. Motility ratings were obtained from 5 normally sighted respondents who were naive as to the purpose of the experiment; the height of the black bar below each object indicates that object’s average rating on a scale from 1 (very unlikely to be seen in motion) to 5 (very likely to be seen in motion). The circles indicate correct recognition responses. Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 12 / 19
  • Overview Static Visual Parsing Tests of Visual Parsing Dynamic Visual Parsing Conclusion Recovery of Static Object Recognition Follow-up Test of Static Visual Parsing S.K. (18 mo), P.B. (10 mo), and J.A. (12 mo) registered significant improvement in static visual parsing Critical period for visual learning should not be strictly applied in predicting therapeutic gains Y. Ostrovsky et al. 100 S.K. (1 mo.) S.K. (18 mo.) J.A. (3 mo.) J.A. (12 mo.) Performance (% correct) P.B. (1 mo.) P.B. (10 mo.) 50 0 How many Name the Trace the How many objects? object long curve objects? Fig. 5. The recently treated subjects’ performance on four tasks with static displays soon after treatment and at follow-up testing after the passage of several months (indicated in the key). Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 13 / 19
  • Overview Motion Information as a Scaffold Tests of Visual Parsing Experimental and Theoretical Evidence for Temporal Learning Conclusion Future Directions Motion Information Available Early Motion sensitivity develops early in primate brain Segmentation from motion 2 months prior to segmentation from static cues in infants Infants link separated parts of partially occluded object through common motion Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 14 / 19
  • Overview Motion Information as a Scaffold Tests of Visual Parsing Experimental and Theoretical Evidence for Temporal Learning Conclusion Future Directions Motion Information Available Early Motion sensitivity develops early in primate brain Segmentation from motion 2 months prior to segmentation from static cues in infants Infants link separated parts of partially occluded object through common motion Motion Information as a Scaffold Motion allows early grouping of parts Correlation of motion-based grouping and static cues (e.g. aligned contours) facilitates grouping via static cues Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 14 / 19
  • To look for such a signature, we focused on obtained in these test phases: animal tasks un- advantage of the fact that primates are effectively Downloaded from www.sciencemag.org on March 9, 20 position tolerance. If two objects consistently Overview Motion Information as a Scaffold related to the test stimuli; no attentional cueing; blind during the brief time it takes to complete a swapped identity across temporally contiguous Tests of Visual Parsing presentations saccade (18). It consistently madeEvidence for Temporal Learning and completely randomized, brief Experimental and Theoretical the image of changes in retinal position then, after sufficient of test stimuli (16). We alternated betweenFutureone object at a peripheral retinal position (swap Conclusion these Directions experience in this “altered” visual world, the two phases (test phase ~5 min; exposure phase position) temporally contiguous with the retinal visual system might incorrectly associate the ~15 min) until neuronal isolation was lost. image of the other object at the center of the neural representations of those objects viewed at To create the altered visual world (“Exposure retina (Fig. 1). Temporal Contiguity and IT cortex different positions into a single object representa- tion (12, 13). We focused on the top level of the phase” in Fig. 1A), each monkey freely viewed the video monitor on which isolated objects We recorded from 101 IT neurons while the monkeys were exposed to this altered visual primate ventral visual stream, the inferior tempo- appeared intermittently, and its only task was to world (isolation held for at least two test phases; ral cortex (IT), where many individual neurons freely look at each object. This exposure “task” is n = 50 in monkey 1; 51 in monkey 2). For each a natural, automatic primate behavior in that it neuron, we measured its object selectivity at each requires no training. However, by means of real- position as the difference in response to the two Disruption of temporal contiguity of visual experience alters McGovern Institute for Brain Research and Department of Brain and Cognitive Sciences, Massachusetts Institute of time eye-tracking (17), the images that played out objects (P − N; all key effects were also found Technology, Cambridge, MA 02139, USA. on the monkey’s retina during exploration of this with a contrast index of selectivity) (fig. S6). We object specificity of IT neurons *To whom correspondence should be addressed. E-mail: dicarlo@mit.edu world were under precise experimental control (16). The objects were placed on the video found that, at the swap position, IT neurons (on average) decreased their initial object selectivity Fig. 1. Experimental design and predictions. (A) IT responses were tested in “Test phase” (green boxes, see text), which alternated with “Exposure phase.” Each exposure phase con- sisted of 100 normal exposures (50 P→P, 50 N→N) and 100 swap exposures (50 P→N, 50 N→P). Stimulus size was 1.5° (16). (B) Each box shows the exposure- phase design for a sin- gle neuron. Arrows show the saccade-induced tem- poral contiguity of reti- nal images (arrowheads point to the retinal im- ages occurring later in time, i.e., at the end of the saccade). The swap position was strictly alternated (neuron-by-neuron) so that it was counter- object images (here P and N). Thus, the predicted effect is a decrease in object balanced across neurons. (C) Prediction for responses collected in the test phase: selectivity at the swap position that increases with increasing exposure (in the If the visual system builds tolerance using temporal contiguity (here driven by limit, reversing object preference), and little or no change in object selectivity at saccades), the swap exposure should cause incorrect grouping of two different the non-swap position. Li and DiCarlo. Unsupervised natural experience rapidly alters invariant object representation in visual cortex. Science 2008. 1503 Danielwww.sciencemag.org SCIENCE VOL 321 O’Shea (djoshea@stanford.edu) 12 SEPTEMBER 2008 Visual Parsing After Recovery from Blindness 15 / 19
  • Overview Motion Information as a Scaffold Tests of Visual Parsing Experimental and Theoretical Evidence for Temporal Learning Conclusion Future Directions Invariant Object Recognition with Slow Feature Analysis Slowness principle can be used to learn independent representations of object position, angle, and identity from 966smoothly moving stimuli L. Wiskott M. Franzius, N. Wilbert, and Fig. 2. Model architecture and stimuli. An input image is fed into the hierarchical network. The circles in each layer denote the overlapping receptive fields for the SFA- Franzius, Wilbert, Wiskott. ICANN 2008. nodes and converge towards the top layer. The same set of steps is applied on each layer, which is O’Shea (djoshea@stanford.edu) side. Parsing After Recovery from Blindness Daniel visualized on the right hand Visual 16 / 19
  • Overview Motion Information as a Scaffold Tests of Visual Parsing Experimental and Theoretical Evidence for Temporal Learning Conclusion Future Directions Learning Static Segmentation from Motion Segmentation According to Natural Examples (SANE) 11 SANE cgtg (5) SANE better Martin better Fig. 11. The examples for which color, multiresolution SANE most outperformed cgtg Martin with features radius 5 using match distance 3 and vice-versa. the Martin detectors in any comparison to SANE. instance of the same object class. Unlike SANE, which segments The results in Figure 10 make it clear that SANE outperforms each image independently, LOCUS jointly segments a collection Martin on these data sets. The examples in Figure 11 illustrate that of images, allowing information in any image to influence the SANE’s advantage lies in its higher precision. While the Martin segmentation of the others. detectors have little trouble finding object boundaries, they also LOCUS provides a good comparison to SANE. Its shape Ross and Kaelbling. non-object boundaries in to Natural Examples: Learning Static Segmentation from Motion Segmentation. IEEE the detect many other Segmentation According the process. In all four model, which includes a probabilistic template describing data sets, the color, multiresolution SANE f-measures outperform expected overall shape of the objects, is much more global than Transactions detectors at Analysis andradii using the strict matching SANE’s shape model, which learns the relationships between all Martin on Pattern all feature Machine Intelligence 2008. distance of 1. As the match maximum distance relaxes to 3 and 5, neighboring boundary segments. Furthermore, LOCUS does not the Martin results improve, especially their recall. At radius 3, the attempt to learn a common image model, while SANE segments Daniel match SANE on the traffic data. At radius Visual Parsing After Recovery from Blindness image prop- / 19 best Martin results only O’Shea (djoshea@stanford.edu) new images using previously learned models of the 17
  • Overview Motion Information as a Scaffold Tests of Visual Parsing Experimental and Theoretical Evidence for Temporal Learning Conclusion Future Directions Structure from Motion where and the measurements whose assignments will be swapped, and and are the projections of the fea- tures originally assigned to them . To conclude the E-step and compute the virtual measure- ments in (11), the only thing left to do is to compute the Recover 3D structure and camera motion from multiple marginal probabilities from the sample . Fortu- nately, this can be done without explicitly storing the sam- Figure 4. Three out of 11 cube images. Although the images without correspondence ples by keeping running counts of how many times each measurement is assigned to feature , and use that to images were originally taken as a sequence in time, the ordering of the images is irrelevant to our method. compute . If we define to be this count, we have: (15) 4.3 Implementation in Practice The pseudo-code for the final algorithm is as follows: 1. Generate an initial structure and motion estimate . hose assignments will t=0 !=0.0 t=1 !=25.1 t=3 !=23.5 rojections of the fea- 2. Given and the data , run the Metropolis sam- pler in each image to obtain approximate values for e the virtual measure- the weights , using equation (15). do is to compute the sample . Fortu- 3. Calculate the virtual measurements with (11). citly storing the sam- Figure 4. Three out of new estimate 4. Find the 11 cube images. Although the and motion for structure t=10 !=18.7 t=20 !=13.5 t=100 !=1.0 ow many times each images were originally taken asmeasurements time,as data. This can using the virtual a sequence in the ure , and use that to ordering of the images using any SFM method compatible be done is irrelevant to our method. with the Figure 5. The structure estimate as initialized and at suc- this count, we have: projection model assumed. cessive iterations of the algorithm. (15) 5. If not converged, return to step 2. time, with the annealing parameter decreasing exponen- e To avoid getting stuck in local minima, it is important in tially from 25 pixels to 1 pixel. For each EM iteration, we practice to add annealing to this basic scheme. In annealing Dellaert, Seitz, Thorpe, and Thrun. Structure from Motionfor the early rithm is as follows: we artificially increase the noise parameter without Correspondence. IEEE each image for 10000 Conference on Computer ran the sampler in Computer Society steps. An entire run takes about a minute of CPU time on a standard PC. As iterations, gradually decreasing it to its correct value. This motion estimate . is typical for EM, the algorithm can sometimes get stuck in t=0 has two beneficial consequences. !=23.5 the posterior distri- Vision and Pattern Recognition 2000. !=0.0 t=1 !=25.1 t=3 First, local minima, in which case we restart it manually. the Metropolis sam- bution will be less peaked when is high, so that the Metropolis sampler will explore the space of assignments In practice, the algorithm converges consistently and fast proximate values for more easily, and avoid getting stuck on islands of high to an estimate for the structure and motion where the correct 15). correspondence is the most probable one, and where most if Daniel O’Shea (djoshea@stanford.edu) probability. Second, the expected log likelihood is Visual Parsing After Recovery from Blindness 18 / 19
  • Overview Motion Information as a Scaffold Tests of Visual Parsing Experimental and Theoretical Evidence for Temporal Learning Conclusion Future Directions Future Directions (Suggestions for Presentations. . . ) Evidence for motion-based learning of object binding? Evidence from motion-alteration or temporal-contiguity-altering experiments? Neuroscience and SFA? Bootstrapped learning of novel objects? Daniel O’Shea (djoshea@stanford.edu) Visual Parsing After Recovery from Blindness 19 / 19