Biologically-inspired Active Vision System for Object Recognition Martin Peniak, Davide Marocco University of Plymouth Ron Babich, John Tran NVIDIA Research Outline 1. Introduction a. Biological vision vs Computer vision b. The role of active perception c. Neural networks and Genetic Algorithms 2. Background a. Presentation of related research (Marocco, Floreano, etc.) 3. Preliminary Experiments a. Method (neural networks + genetic algorithms on GPU) b. Results (video of evolved controllers) 4. ConclusionsA long-standing challenge in robotics is the development of a truly robust and general-purpose vision systemsuitable for object identification, navigation, and other tasks. An unconventional but promising approach fortackling this challenge relies on the concept of active perception, inspired by the observation that biologicalorganisms interact with the world in order to make sense of it. In the context of vision, this argues for a systemthat takes in only a small part of the scene at a time (mimicking that captured by the fovea in the human eye),moving from one such part to another in rapid succession. By leveraging a neural network for control, it is possibleto evolve an active vision system with the desired characteristics.Prior work has relied on very small arrays of photoreceptors (e.g., 5x5), applied to simple identification tasks suchas distinguishing a triangle from a square. Although valuable as proofs of concept, tackling real-world problemswill require much larger systems backed by much larger neural networks, where the computational cost of traininggrows super-linearly. We thus turn to an efficient CUDA implementation, scalable to many GPUs in parallel.Our system is based on an Elman-type recurrent neural network with a biologically-inspired retina. The neuralnetwork is evolved through a genetic algorithm incorporating the island model, which involves segregatedpopulations whose members migrate between “islands” only infrequently. This design both facilitates parallelscaling and improves the quality of the final solution by avoiding convergence to local optima.The active vision system was required to learn to recognize five different objects from Amsterdam Library ofObject Images (ALOI). These objects were presented to the system during the evolutionary process in 16 differentilluminations and 36 different rotation angles. Every neural network controller was able to explore each of thesevariations in parallel on GPU, which made the evolutionary process significantly faster than a multi-threaded CPUcode. At the end of evolution, the controllers with the highest fitness were able to successfully recognize all theobjects within 20 time-steps. Our preliminary results suggest that this system is tolerant to variations in objectrotation, position and scale.