Successfully reported this slideshow.
Your SlideShare is downloading. ×

“Event-Based Neuromorphic Perception and Computation: The Future of Sensing and AI,” a Keynote Presentation from Ryad Benosman

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 59 Ad

“Event-Based Neuromorphic Perception and Computation: The Future of Sensing and AI,” a Keynote Presentation from Ryad Benosman

Download to read offline

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2022/08/event-based-neuromorphic-perception-and-computation-the-future-of-sensing-and-ai-a-keynote-presentation-from-ryad-benosman/

Ryad Benosman, Professor at the University of Pittsburgh and Adjunct Professor at the CMU Robotics Institute, presents the “Event-Based Neuromorphic Perception and Computation: The Future of Sensing and AI” tutorial at the May 2022 Embedded Vision Summit.

We say that today’s mainstream computer vision technologies enable machines to “see,” much as humans do. We refer to today’s image sensors as the “eyes” of these machines. And we call our most powerful algorithms deep “neural” networks. In reality, the principles underlying current mainstream computer vision are completely different from those underlying biological vision. Conventional image sensors operate very differently from eyes found in nature, and there’s virtually nothing “neural” about deep neural networks. Can we gain important advantages by implementing computer vision using principles of biological vision? Professor Ryad Benosman thinks so.

Mainstream image sensors and processors acquire and process visual information as a series of snapshots recorded at a fixed frame rate, resulting in limited temporal resolution, low dynamic range and a high degree of redundancy in data and computation. Nature suggests a different approach: Biological vision systems are driven and controlled by events within the scene in view, and not – like conventional techniques – by artificially created timing and control signals that have no relation to the source of the visual information.

The term “neuromorphic” refers to systems that mimic biological processes. In this talk, Professor Benosman — a pioneer of neuromorphic sensing and computing — introduces the fundamentals of bio-inspired, event-based image sensing and processing approaches, and explores their strengths and weaknesses. He shows that bio-inspired vision systems have the potential to outperform conventional, frame-based systems and to enable new capabilities in terms of data compression, dynamic range, temporal resolution and power efficiency in applications such as 3D vision, object tracking, motor control and visual feedback loops.

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2022/08/event-based-neuromorphic-perception-and-computation-the-future-of-sensing-and-ai-a-keynote-presentation-from-ryad-benosman/

Ryad Benosman, Professor at the University of Pittsburgh and Adjunct Professor at the CMU Robotics Institute, presents the “Event-Based Neuromorphic Perception and Computation: The Future of Sensing and AI” tutorial at the May 2022 Embedded Vision Summit.

We say that today’s mainstream computer vision technologies enable machines to “see,” much as humans do. We refer to today’s image sensors as the “eyes” of these machines. And we call our most powerful algorithms deep “neural” networks. In reality, the principles underlying current mainstream computer vision are completely different from those underlying biological vision. Conventional image sensors operate very differently from eyes found in nature, and there’s virtually nothing “neural” about deep neural networks. Can we gain important advantages by implementing computer vision using principles of biological vision? Professor Ryad Benosman thinks so.

Mainstream image sensors and processors acquire and process visual information as a series of snapshots recorded at a fixed frame rate, resulting in limited temporal resolution, low dynamic range and a high degree of redundancy in data and computation. Nature suggests a different approach: Biological vision systems are driven and controlled by events within the scene in view, and not – like conventional techniques – by artificially created timing and control signals that have no relation to the source of the visual information.

The term “neuromorphic” refers to systems that mimic biological processes. In this talk, Professor Benosman — a pioneer of neuromorphic sensing and computing — introduces the fundamentals of bio-inspired, event-based image sensing and processing approaches, and explores their strengths and weaknesses. He shows that bio-inspired vision systems have the potential to outperform conventional, frame-based systems and to enable new capabilities in terms of data compression, dynamic range, temporal resolution and power efficiency in applications such as 3D vision, object tracking, motor control and visual feedback loops.

Advertisement
Advertisement

More Related Content

Similar to “Event-Based Neuromorphic Perception and Computation: The Future of Sensing and AI,” a Keynote Presentation from Ryad Benosman (20)

More from Edge AI and Vision Alliance (20)

Advertisement

Recently uploaded (20)

“Event-Based Neuromorphic Perception and Computation: The Future of Sensing and AI,” a Keynote Presentation from Ryad Benosman

  1. 1. R.B. Benosman, McGowan Institute, BST-3, Rm 2046, 3501 Fifth Avenue Pittsburgh, PA 15213 benosman@pitt.edu Event-based Neuromorphic Perception and Computation: The Future of Sensing and AI 1
  2. 2. R.B. Benosman, McGowan Institute, BST-3, Rm 2046, 3501 Fifth Avenue Pittsburgh, PA 15213 benosman@pitt.edu Event-based Neuromorphic Perception and Computation: The Future of Sensing and AI 2
  3. 3. 1959 Russell’s Infant Son: 5cm by 5cm (176x176 array) Portland Art Museum. David Hubel and Torsten Wiesel —  in 1959.Their publication, entitled “Receptive fields of single neurons in the cat’s striate cortex” Lawrence Roberts’ “Machine perception of three-dimensional solids”. Process 2D photographs to build up 3D representations from lines. 1963 1966 MIT summer Vision Project, Seymour Papert, automatically, background/foreground segmentation, extract non- overlapping objects David Marr, “Vision a computational investigation into the human representtion and processing of visual information” vision is hierarchical 1982 Franck Rosenblatt: Perceptron 1958 AI hype? Frank Rosenblatt (1928-1971) • Books: Marr (1982), Ballard & Brown (1982), Horn (1986) 1990s: Geometry reigns • Fundamental matrix: Faugeras (1992) • Normalized 8-point algorithm: Hartley (1997) • RANSAC for robust fundamental matrix estimation: Torr & Murray (1997) • Bundle adjustment: Triggs et al. (1999) • Hartley & Zisserman book (2000) • Projective structure from motion: Faugeras and Luong (2001) Early applications of image analysis • Character and digit recognition • First OCR conference in 1962 • Microscopy, cytology • Interpretation of aerial images • Even before satellites! • Particle physics • Hough transform for analysis of bubble chamber photos published in 1959 • Face recognition • Article about W. Bledsoe Azriel Rosenfeld (1931-2004) “Father of computer vision” • Ph.D. in mathematics, Columbia, 1957 • Professor at UMD and ordained rabbi • Wrote first textbook in the field in 1969 1969 Early applications of image analysis • Character and digit recognition • First OCR conference in 1962 • Microscopy, cytology • Interpretation of aerial images • Even before satellites! • Particle physics • Hough transform for analysis of bubble chamber photos published in 1959 • Face recognition • Article about W. Bledsoe • Fingerprint recognition Azriel Rosenfeld (1931-2004) “Father of computer vision” • Ph.D. in mathematics, Columbia, 1957 • Professor at UMD and ordained rabbi • Wrote first textbook in the field in 1969 • Oral history (IEEE interview, 1998) Azriel Rosenfeld Early applications of image analysis 1990’ s Vision is ruled by Geometry, (projective) 3D reconstruction, fundamental matrix, RANSAC, bundle, 2000’ s Featurebased Pattern recognition. Keypoints 3D reconstruction gets “solved”, generic object recognition Historic Timeline 3
  4. 4. ? 2000’ s 1959 Russell’s Infant Son: 5cm by 5cm (176x176 array) Portland Art Museum. David Hubel and Torsten Wiesel —  in 1959. Their publication, entitled “Receptive fields of single neurons in the cat’s striate cortex” Lawrence Roberts’ “Machine perception of three-dimensional solids”. Process 2D photographs to build up 3D representations from lines. 1963 1966 MIT summer Vision Project, Seymour Papert, automatically, background/foreground segmentation, extract non- overlapping objects David Marr, “Vision a computational investigationinto the human representtion and processing of visual information” vision is hierarchical 1982 ck nblatt: ptron 958 e? (1928-1971) • Books: Marr (1982), Ballard & Brown (1982), Horn (1986) 1990s: Geometry reigns • Fundamental matrix: Faugeras (1992) • Normalized 8-point algorithm: Hartley (1997) • RANSAC for robust fundamental matrix estimation: Torr & Murray (1997) • Bundle adjustment: Triggs et al. (1999) • Hartley & Zisserman book (2000) • Projective structure from motion: Faugeras and Luong (2001) of image analysis ition 62 ages sis of bubble in 1959 Azriel Rosenfeld (1931-2004) “Father of computer vision” • Ph.D. in mathematics, Columbia, 1957 • Professor at UMD and ordained rabbi 1969 Early applications of image analysis • Character and digit recognition • First OCR conference in 1962 • Microscopy, cytology • Interpretation of aerial images • Even before satellites! • Particle physics • Hough transform for analysis of bubble chamber photos published in 1959 • Face recognition • Article about W. Bledsoe • Fingerprint recognition Azriel Rosenfeld (1931-2004) “Father of computer vision” • Ph.D. in mathematics, Columbia, 1957 • Professor at UMD and ordained rabbi • Wrote first textbook in the field in 1969 • Oral history (IEEE interview, 1998) Azriel Rosenfeld Early applications of image analysis 1990’ s Vision is ruled by Geometry, (projective) 3D reconstruction, fundamental matrix, RANSAC, bundle, 2000’ s Featurebased Pattern recognition. Keypoints 3D reconstruction gets “solved”, generic object recognition Historic Timeline 4
  5. 5. • Computer vision has been reinvented at least three times. • Too close to the market: applications based research Tendency to resist novelty choosing applications over potentially more promising methods that could not yet deliver • Not idea driven Topic distribution of submissions for ICCV 2019 FromSvetlana Lazebnik,ComputerVision:LookingBack to Look Forward(2020) What Went Wrong? 5
  6. 6. What Went Wrong? 6
  7. 7. • Images are the optimal structure of data • Grey Levels as source of information Why Are We Using Images? 7
  8. 8. • Invention of the camera obscura in 1544 (L. Da Vinci?) • The mother of all cameras Computer Vision: a Heritage from Art! 8
  9. 9. Origins of Imaging • Increasing profits: painting faster • Evolution from portable models for travellers to current digital cameras • Evolving from canvas, to paper, to glass, to celluloid, to pixels 9
  10. 10. Origins of Video: Motion Picture • Early work in motion-picture projection • Pioneering work on animal locomotion in 1877 and 1878 • Used multiple cameras to capture motion in stop-motion photographs Eadweard Muybridge (1830-1904) 10
  11. 11. Computer vision, the Impossible Trade Off! power vs frame rate redundant useless data power and resource hungry: need to acquire/transmit/store/ process motion blur displacement between frames High Power & High Latency 11
  12. 12. Event Acquisition • Reduce Data Load and only Detect “meaningful” events, at the time they happen! • No generic solution, Solutions: Scopes: • Avoid burning energy to acquire, transmit and store information that ends up being trashed • There are almost an infinite number of solutions to extract events • Need to be adapted to the dynamics and nature of the data 12
  13. 13. Event acquisition • New Information is detected when it happens • When nothing happens, nothing is sent or processed • Sparse information coding mages. In order to overcome these limitations and ate temporal sampling of f (t), it is more efficient to s of f (t) just at the exact time at which they occur mely sampling on the other axis. a) t (b) f(t) t ways to sample functions values, in (a) using a t scale on the t axis, in (b) using a constant scale es of f (t). t0 tk f x,y(t) ∆f ˆ f x,y( α1 β1 α2 β2 Ev(t) +1 +1 +1 -1 -1 -1 -1 -1 +1 Popular solution: Sample on the amplitude axis of signals Time is the most valuable information 13
  14. 14. Event acquisition mages. In order to overcome these limitations and ate temporal sampling of f (t), it is more efficient to s of f (t) just at the exact time at which they occur mely sampling on the other axis. a) t (b) f(t) t ways to sample functions values, in (a) using a t scale on the t axis, in (b) using a constant scale es of f (t). t0 tk f x,y(t) ∆f ˆ f x,y( α1 β1 α2 β2 Ev(t) +1 +1 +1 -1 -1 -1 -1 -1 +1 Popular solution: Sample on the amplitude axis of signals Time is the most valuable information 14
  15. 15. e. This paper reviews elopment of a high- namic vision sensor pt entirely, and then r efficient low-level nd high-level object spike events. These use them for object mber of events but nts. Labeling attaches s, e.g. orientation or vents to track moving nt-by-event basis and as the basis for ject for filtering and ent past event times. ese past event times ger branching logic to . These methods are al digital hardware, g-based approach for egrates a neural style e. All code is open- sourceforge.net). address-event, vision ature extraction, low- (a) Vision sensor system (b) Vision sensor USB interface TMPDIFF128 16 bit counter USB interface Host PC USB Req Biases Ack Enb Enb 16 bit bus FIFOs buffers 3 Fig. 5 Shows the present implementation of the TMPDIFF128 camera system with USB2.0 interface. (a) shows the vision sensor system. (b of the USB hardware and software interface. The vision sensor (TMPDIFF128) sends AEs to the USB interface, which also captures tim running counter running at 100 kHz that shares the same 16-bit bus. These timestamped events are buffered by the USB FIFOs to be sent to also buffers the data in USB driver FIFOs, ‘unwraps’ the 16 bit timestamps to 32 bit values, and offers this data to other threads for further p USB chip also uses a serial interface to control the vision sensor biases. Flash memory on the USB chip stores persistent bias values. 15
  16. 16. IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 46, NO. 1, JANUARY 2011 259 A QVGA 143 dB Dynamic Range Frame-Free PWM Image Sensor With Lossless Pixel-Level Video Compression and Time-Domain CDS Christoph Posch, Member, IEEE, Daniel Matolin, and Rainer Wohlgenannt Abstract—The biomimetic CMOS dynamic vision and image sensor described in this paper is based on a QVGA (304 240) array of fully autonomous pixels containing event-based change detection and pulse-width-modulation (PWM) imaging circuitry. Exposure measurements are initiated and carried out locally by the individual pixel that has detected a change of brightness in its field-of-view. Pixels do not rely on external timing signals and independently and asynchronously request access to an (asynchronous arbitrated) output channel when they have new grayscale values to communicate. Pixels that are not stimulated visually do not produce output. The visual information acquired from the scene, temporal contrast and grayscale data, are com- municated in the form of asynchronous address-events (AER), with the grayscale values being encoded in inter-event intervals. The pixel-autonomous and massively parallel operation ideally results in lossless video compression through complete temporal redundancy suppression at the pixel level. Compression factors depend on scene activity and peak at 1000 for static scenes. Due to the time-based encoding of the illumination information, very high dynamic range—intra-scene DR of 143 dB static and 125 dB at 30 fps equivalent temporal resolution—is achieved. A novel time-domain correlated double sampling (TCDS) method yields array FPN of 0.25% rms. SNR is 56 dB (9.3 bit) for 10 Lx illuminance. Index Terms—Address-event representation (AER), biomimetics, CMOS image sensor, event-based vision, focal-plane processing, high dynamic range (HDR), neuromorphic electronics, time-domain CDS, time-domain imaging, video compression. I. INTRODUCTION BIOLOGICAL sensory and information processing sys- tems appear to be much more effective in dealing with real-world tasks than their artificial counterparts. Humans still outperform the most powerful computers in routine functions involving, e.g., real-time sensory data processing, perception tasks and motion control and are, most strikingly, orders of synchronous information processing. It has been demonstrated [1]–[3] that modern silicon VLSI technology can be employed in the construction of biomimetic or neuromorphic artefacts that mimic biological neural functions. Neuromorphic systems, as the biological systems they model, process information using energy-efficient, asynchronous, event-driven methods. The greatest successes of neuromorphic systems to date have been in the emulation of peripheral sensory transduction, most notably in vision. Since the seminal attempt to build a “silicon retina” by Mahowald and Mead in the late 1980s [4], a variety of biomimetic vision devices has been proposed and implemented [5]. In the field of imaging and vision, two observations are crucial: biology has no notion of a frame, and the world—the source of most visual information we are interested in—works in continuous-time and asynchronously. The authors are con- vinced that biomimetic asynchronous electronics and signal processing have the potential—also in fields that are histori- cally dominated by synchronous approaches such as artificial vision, image sensing and image processing—to reach entirely new levels of performance and functionality, comparable to the ones found in biological systems. Future artificial vision systems, if they want to succeed in demanding applications such as, e.g., autonomous robot navigation, high-speed motor control, visual feedback loops, etc. must exploit the power of the asynchronous, frame-free, biomimetic approach. Studying biological vision, it has been noted that there exist two different types of retinal ganglion cells and corresponding retina–brain pathways in, e.g., the human retina: The “Magno”- cells are at the basis of what is named the transient channel or the Magno-cellular pathway. They have short latencies and respond transiently when changes—movements, onsets, offsets—are in- volved. The “Parvo”-cells are at the basis of what is called the sustained channel or the Parvo-cellular pathway. Parvo-cells are Temporal events and absolute light measurement 16
  17. 17. Frames vs Events 17
  18. 18. Why Event Based Sensors? Chaotic pendulum tracking 18
  19. 19. Event Time-based Sensor: Grey Levels Events 19
  20. 20. CCAM sensors provide frame-free visual informa CCAM is g less events equivalent 1 based came the number of events depends on the dynamics of the scene. For standard cameras this amount is constant. Why Event Based Sensors? 20
  21. 21. Event Cameras 21
  22. 22. Event cameras have become a commodity Event Cameras 22
  23. 23. How to Process Events? 23
  24. 24. How to Process Events? 24
  25. 25. Incremental Time ? Update previous result incrementally for the single pixel change Compute the new 1 pixel- change frame and compute result using the whole frame Batch or Frame Figure 3: Sample results from our method. The columns depict raw events, time manifold, result without manifold regularisation and finally with our manifold regularisation. Notice the increased contrast in weakly textured regions (especially around the edge of the monitor). the results of [1]. We point out that no ground truth data is available so we are limited to purely qualitative comparisons. In Fig. 4 we show a few images from the sequences. Since we are dealing with highly dynamic data, we point the reader to the included supplementary video3 which shows whole sequences of several hundred frames. Figure 4: Comparison to the method of [1]. The first row shows the raw input events that have been used for both methods. The second row depicts the results of Bardow et al., and the last row shows our result. We can see that out method produces more details (e.g. face, beard) as well as more graceful gray value variations in untextured areas, where [1] tends to produce a single gray value. 4.4 Compar ison to Standar d Camer as We have captured a sequence using a DVS128 camera as well as a Canon EOS60D DSLR camera to compare the fundamental differences of traditional cameras and event-based cam- eras. As already pointed out by [1], rapid movement results in motion blur for conventional 3ht t ps: / / www. yout ube. com / wat ch?v=r vB2URr G T94 Figure 2: Block diagram of the proposed approach. The output of the event camera is collected into frames over a specified time interval T, using aseparate channel depending on the event polarity (positiveand negative). The resulting synchronous event framesareprocessed by aResNet-inspired network, which producesaprediction of the steering angle of thevehicle. without resorting to partitioning the solution space; the an- gles produced by our network can take any value, not just discreteones, in therange[− 180◦ ,180◦ ]. Moreover, in con- trast to previous event-based vision learning works which use small datasets, we show results on the largest and most challenging (dueto scenevariability) event-based dataset to date. 3. Methodology Our approach aims at predicting steering wheel com- mandsfrom aforward-looking DVSsensor [1] mounted on a car. As shown in Fig. 2, we propose a learning approach that takes as input the visual information acquired by an event camera and outputs the vehicle’s steering angle. The events are converted into event frames by pixel-wise accu- mulation over aconstant time interval. Then, adeep neural network mapstheevent framesto steering anglesby solving a regression task. In the following, we detail the different and negativeevents. Thehistogram for positiveevents is h+ (x,y) . = Â tk2T, pk= + 1 d(x− xk,y− yk), (1) where d is the Kronecker delta, and the histogram h− for thenegativeeventsisdefined similarly, using pk = − 1. The histogramsh+ and h− arestacked to produceatwo-channel event image. Events of different polarity are stored in dif- ferent channels, asopposed to asinglechannel with thebal- ance of polarities (h+ − h− ), to avoid information loss due to cancellation in case events of opposite polarity occur in the samepixel during the integration interval T. 3.2. Learning Approach 3.2.1. Preprocessing. A correct normalization of input and output data is essential for reliably training any neural network. Since roads are almost always straight, the steer- ing angle’s distribution of a driving car is mainly picked in [−5◦ ,5◦ ]. This unbalanced distribution results in a biased regression. In addition, vehicles frequently stand still be- Event Computation V S 25
  26. 26. Applications: Event Stereovision • Matching pixels is hard • Changing lighting conditions, occlusions, motion blur…. 26
  27. 27. • Matching binocular events only using the time of events • Two events arriving at the same time and fulfilling geometric constraints are matched Event Stereovision P.Rogister, R.B. Benosman, S-H. Ieng, P. Lichtsteiner, T. Delbruck, Asynchronous Event-based Binocu- lar Stereo Matching. 2011 December 23, IEEE Transactions on Neural Networks 23(2) : 347-353, DOI : 10.1109/TNNLS.2011.2180025 27
  28. 28. S.H. Ieng, J. Carneiro, M. Osswald, R. B.Benosman, Neuromorphic Event-Based Generalized Time-based Stereovision, 2018 July 18, Frontiers in Neuroscience, 12(442), DOI :10.3389/fnins.2018.00442 Event Stereovision 28
  29. 29. Courtesy Chronocam 2016 29
  30. 30. Visual Odometry Courtesy Chronocam 2016 30
  31. 31. 2 When events are transmitted off-chip, they are time- ped and then transmitted to a computer using a standard connection. EVENT-BASED VISUAL MOTION FLOW stream of events from the silicon retina can be mathemat- y defined as follows: let e(p, t) = (p, t)T a triplet giving position p = (x, y)T and the time t of an event. We can define the function Σe that maps to each p, the time t: Σe : R2 → R3 p → t = Σe. (1) e being an increasing function, Σe is then a monotonically asing surface. t Σey 2 hen events are transmitted off-chip, they are time- and then transmitted to a computer using a standard nnection. ENT-BASED VISUAL MOTION FLOW am of events from the silicon retina can be mathemat- fined as follows: let e(p,t) = (p,t)T a triplet giving ion p = (x, y)T and the time t of an event. We can ne the function Σe that maps to each p, the time t: Σe : R2 → R3 p → t = Σe. (1) ng an increasing function, Σe isthen amonotonically g surface. t Σey eeds a set the pixel decreased. al clocked data thus he timing d accurate me rate” is 2, FEBRUARY 2008 peration. In (a), the C and due to the e pixel is sensi- fine as h 128 by F spikes ner et al. ortional to X x Fig. 2. General principle of visual flow computation, the surface of active events Σe is derived to provide an estimation of orientation and amplitude of motion. We then set the first partial derivatives with respect to the parameters as: Σex = ∂ Σ e ∂ x and Σey = ∂ Σ e ∂ y . We can then write Σe as: Σe(p + ∆ p) = Σe(p) + ∇ ΣT e ∆ p + o(||∆ p||), (2) with ∇ Σe = ( ∂ Σ e ∂ x , ∂ Σ e ∂ y )T . Thepartial functions of Σe are functions of asingle variable whether x or y. Time being a strictly increasing function, Σe is a nonzero derivatives surface at any point. It is then possible to use the inverse function theorem to write around a location p = (x, y)T : ∂Σe ∂x (x, y0) = dΣe|y= y0 dx (x) = 1 vx (x, y0) , ∂Σe ∂y (x0, y) = dΣe|x= x 0 dy (y) = 1 vy (x0, y) , (3) Σe|x= x , Σe|y= y being Σe restricted respectively to y and pixel schematic. (b) Principle of operation. In (a), the or single-ended inverting amplifiers. ferencing circuit removes DC and due to the rsion in the photoreceptor, the pixel is sensi- ontrast , which we define as (1) hotocurrent. (The units of do not affect ) illustrates the principle of operation of the of this section, we will consider in detail the component parts of the pixel circuit (Fig. 2). (b) on DVS sensor with 128 by ple of ON and OFF spikes dapted from Lichtsteiner et al. xel’s voltage Vp proportional to corresponding generation of e change threshold) and OFF , from which the evolution of Σe(p + ∆ p) = Σe(p) + ∇ Σe ∆ p + o(||∆ p||), with ∇ Σe = ( ∂ Σ e ∂ x , ∂ Σ e ∂ y )T . Thepartial functions of Σe arefunctions of asingle vari whether x or y. Time being a strictly increasing function is a nonzero derivatives surface at any point. It is then pos to use the inverse function theorem to write around a loca p = (x, y)T : ∂Σe ∂x (x, y0) = dΣe|y= y0 dx (x) = 1 vx (x, y0) , ∂Σe ∂y (x0, y) = dΣe|x= x 0 dy (y) = 1 vy (x0, y) , Σe|x= x 0 , Σe|y= y0 being Σe restricted respectively to y x. The gradient ∇ Σe can then be written: ∇ Σe = ( 1 vx , 1 vy )T , which provides the inverse of the pixellic velocity of ev For an incoming event : Form the surface (event times): We then have: IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 25, NO. 2, FEBRUARY 2014 407 Event-Based Visual Flow Ryad Benosman, Charles Clercq, Xavier Lagorce, Sio-Hoi Ieng, and Chiara Bartolozzi Abstract—This paper introduces a new methodology to com- pute dense visual flow using the precise timings of spikes from an asynchronous event-based retina. Biological retinas, and their artificial counterparts, are totally asynchronous and data-driven and rely on a paradigm of light acquisition radically different from most of the currently used frame-grabber technologies. This paper introduces a framework to estimate visual flow from the local properties of events’ spatiotemporal space. We will show that precise visual flow orientation and amplitude can be estimated using a local differential approach on the surface defined by coactive events. Experimental results are presented; they show the method adequacy with high data sparseness and temporal resolution of event-based acquisition that allows the computation of motion flow with microsecond accuracy and at very low computational cost. Index Terms—Event-based vision, event-based visual motion flow, neuromorphic sensors, real time. I. INTRODUCTION RECENT work in this paper of neural activity has shown that each spike arrival time is reliable [1]–[4]. However, the extent to which the precise timing of neural spikes down to millisecond precision is significant for computation is still a matter of debate. In this paper, we address this issue by focus- ing on the computational principles that could be operated by motion-sensitive neurons of the visual system to compute the visual flow. We bring together new experimental sensors delivering truly naturalistic precise timed visual outputs and a new mathematical method that allows to compute event-based visual motion flow using each incoming spike’s timing as main computation feature. This presented method does not rely on gray levels, nor on the integration of activity over long time intervals. It uses each relative timing of changes of individual information in various demanding, machine vision applica- tions [9]–[15]. The microsecond temporal resolution and the inherent redundancy suppression of the frame-free, event- driven acquisition and subsequent representation of visual information employed by these cameras enables to derive a novel methodology to process the visual information at a very high speed and with low computational cost. Visual flow is a topic of several research fields that has been intensively studied since the early days of computational neuroscience. It is widely used in artificial vision and essential in navigation. Visual flow is known to be an ill-posed noisy visual measure limited by the aperture problem. Its use in real- time applications on natural scenes is generally difficult. It is usually computed sparsely on high salient points. Visual flow techniques are commonly classified under one of the four major categories. 1) Energy-based or frequency-based methods estimate opti- cal flow from the output of the velocity-tuned filters designed in the Fourier domain [16]–[18]. 2) Phase-based methods estimate image velocity in terms of band-pass filter outputs [19]. 3) Correlation-based or region-matching methods search for a best match of small spatial neighborhoods between adjacent frames [20]–[25]. 4) Gradient-based or differential methods use spatiotem- poral image intensity derivatives and an assumption of brightness constancy [26]–[28]. Energy-based techniques are slow [19] and are not adequate for real-time applications where gradient-based approaches perform better, as they rely on correlations. Visual flow 31
  32. 32. Event Flow 32
  33. 33. • High temporal resolution generates smooth space-time surfaces • The slope of the local surface contains the orientation and amplitude of the optical flow Motion estimation: optical flow Event Flow 33
  34. 34. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 25, NO. 2, FEBRUARY 2014 407 Event-Based Visual Flow Ryad Benosman, Charles Clercq, Xavier Lagorce, Sio-Hoi Ieng, and Chiara Bartolozzi Abstract—This paper introduces a new methodology to com- pute dense visual flow using the precise timings of spikes from an asynchronous event-based retina. Biological retinas, and their artificial counterparts, are totally asynchronous and data-driven and rely on a paradigm of light acquisition radically different from most of the currently used frame-grabber technologies. This paper introduces a framework to estimate visual flow from the local properties of events’ spatiotemporal space. We will show that precise visual flow orientation and amplitude can be estimated using a local differential approach on the surface defined by coactive events. Experimental results are presented; they show the method adequacy with high data sparseness and temporal resolution of event-based acquisition that allows the computation of motion flow with microsecond accuracy and at very low computational cost. Index Terms—Event-based vision, event-based visual motion flow, neuromorphic sensors, real time. I. INTRODUCTION RECENT work in this paper of neural activity has shown that each spike arrival time is reliable [1]–[4]. However, the extent to which the precise timing of neural spikes down to millisecond precision is significant for computation is still a matter of debate. In this paper, we address this issue by focus- ing on the computational principles that could be operated by motion-sensitive neurons of the visual system to compute the visual flow. We bring together new experimental sensors delivering truly naturalistic precise timed visual outputs and a new mathematical method that allows to compute event-based visual motion flow using each incoming spike’s timing as main computation feature. This presented method does not rely on gray levels, nor on the integration of activity over long time intervals. It uses each relative timing of changes of individual information in various demanding, machine vision applica- tions [9]–[15]. The microsecond temporal resolution and the inherent redundancy suppression of the frame-free, event- driven acquisition and subsequent representation of visual information employed by these cameras enables to derive a novel methodology to process the visual information at a very high speed and with low computational cost. Visual flow is a topic of several research fields that has been intensively studied since the early days of computational neuroscience. It is widely used in artificial vision and essential in navigation. Visual flow is known to be an ill-posed noisy visual measure limited by the aperture problem. Its use in real- time applications on natural scenes is generally difficult. It is usually computed sparsely on high salient points. Visual flow techniques are commonly classified under one of the four major categories. 1) Energy-based or frequency-based methods estimate opti- cal flow from the output of the velocity-tuned filters designed in the Fourier domain [16]–[18]. 2) Phase-based methods estimate image velocity in terms of band-pass filter outputs [19]. 3) Correlation-based or region-matching methods search for a best match of small spatial neighborhoods between adjacent frames [20]–[25]. 4) Gradient-based or differential methods use spatiotem- poral image intensity derivatives and an assumption of brightness constancy [26]–[28]. Energy-based techniques are slow [19] and are not adequate for real-time applications where gradient-based approaches perform better, as they rely on correlations. Visual flow 34
  35. 35. Tracking Real-Time Outdoor Scenes Z. Ni, S.H. Ieng, C. Posch, S. Regnier, R.B. Benosman, Visual Tracking using Neuromorphic Asynchronous Event-based Cameras, 24 February 2015 Neural Computation 27(4) :925-53, DOI : 10.1162/NECO-a-00720 35
  36. 36. that errorsaremoreequally shared between thevanandthecar thankstothemore reliable event attribution. Figure 16: The events generated by the van and the car during 3s are fitted into two planes, denoted asP1 in blueand P2 in red. n1 and n2 arethesurfacenormals. Tracking Real-Time Outdoor Scenes Z. Ni, S.H. Ieng, C. Posch, S. Regnier, R.B. Benosman, Visual Tracking using Neuromorphic Asynchronous Event-based Cameras, 24 February 2015 Neural Computation 27(4) :925-53, DOI : 10.1162/NECO-a-00720 36
  37. 37. Event-Based Tracking with a Moving Camera Z. Ni, S.H. Ieng, C. Posch, S. Regnier, R.B. Benosman, Visual Tracking using Neuromorphic Asynchronous Event-based Cameras, 24 February 2015 Neural Computation 27(4) :925-53, DOI : 10.1162/NECO-a-00720 37
  38. 38. Event-Based 3D Tracking and Pose Estimation D. Reverter-Valeiras, G. Orchard, S.H. Ieng, R.B. Benosman,Neuromorphic Event-Based 3D Pose Estimation, 2016January 22, Frontiers in Neuromorphic 9(522), DOI : 10.3389/fnins.2015.00522 38
  39. 39. Low Power and Latency Streaming 39
  40. 40. Asynchronous Event-Based Fourrier Analysis Q. Sabatier, S.H. Ieng, R.B. Benosman, Asynchronous Event-based Fourier Analysis, 2017 February 6, IEEE Transactions on Image Processing 26 (5) :2192-2202, DOI : 10.1109/TIP.2017.2661702 40
  41. 41. Last Two Decades: Rethinking Computer Vision in The Time Domain
  42. 42. (b) Events from the sensor (f) Time surface (a) Event-driven time-based vision sensor (ATIS or DVS) Context amplitude surface amplitude X (spatial) Y ( s p a t i a l ) ON events OFF events Deep Temporal Learning: Time Surfaces Time Surface Pixels X. Lagorce, G. Orchard, F. Galluppi, B. E. Shi, R.B. Benosman, HOTS : A Hierarchy Of event- based Time-Surfaces for pattern recognition, 2016 July 11, IEEE Transaction on Pattern Analysis and Machine Intelligence 39(7) :1346-1359, doi :10.1109/TPAMI.2016.2574707 42
  43. 43. Fig. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is pre events (c) which are fed into Layer 1. The events are convolved with exponential kernels length ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is m constitute the output of the layer (g). Each layer k (gray boxes) takes input from its pre The output of Layer k is presented between Layer k and k þ 1 ((g),(h),(i)). To compute e length ð2Rk þ 1Þ around each pixel. The event contexts are compared to the differe 1350 IEEE TRANSACTIONS ON PATTERN ANALYSIS Fig. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is presented to the ATIS camera (b) which events (c) which are fed into Layer 1. The events are convolved with exponential kernels (d) to build event contexts from spatia length ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is matched, it produces an event (f). Even 1350 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL Fig. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is presented to the ATIS camera (b) which p events (c) which are fed into Layer 1. The events are convolved with exponential kernels (d) to build event contexts from spatial length ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is matched, it produces an event (f). Event 1350 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 3 Fig. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is presented to the ATIS cam events (c) which are fed into Layer 1. The events are convolved with exponential kernels (d) to build event contex length ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is matched, it produces an e constitute the output of the layer (g). Each layer k (gray boxes) takes input from its previous layer and feeds th <latexit sha1_base64="E0lXgR5y61QwpnHFZI9ZGbqt2ts=">AAACy3icjVHLSsNAFD2Nr1pfVZdugkVwVRJRdFl040aoYB/QljKZTmtomoRkItTq0h9wq/8l/oH+hXfGKahFdEKSM+eec2fuvV4c+Kl0nNecNTe/sLiUXy6srK6tbxQ3t+pplCVc1HgUREnTY6kI/FDUpC8D0YwTwUZeIBre8EzFGzciSf0ovJLjWHRGbBD6fZ8zSVSzLVlmd91Ct1hyyo5e9ixwDSjBrGpUfEEbPUTgyDCCQAhJOABDSk8LLhzExHUwIS4h5Ou4wD0K5M1IJUjBiB3Sd0C7lmFD2qucqXZzOiWgNyGnjT3yRKRLCKvTbB3PdGbF/pZ7onOqu43p75lcI2Ilron9yzdV/tenapHo40TX4FNNsWZUddxkyXRX1M3tL1VJyhATp3CP4glhrp3TPtvak+raVW+Zjr9ppWLVnhtthnd1Sxqw+3Ocs6B+UHaPys7lYalyakadxw52sU/zPEYF56iipuf4iCc8WxdWat1ad59SK2c82/i2rIcPBnaRrw==</latexit> ⌧1 (c) Spatio-temporal domain (e) Exponential kernels (b) Events from the sensor (f) Time surface (d) Time context (a) Event-driven time-based vision sensor (ATIS or DVS) Context amplitude surface amplitude X (spatial) Y ( s p a t i a l ) ON events OFF events Clustering Fig. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is presented to the ATIS camera (b) which events (c) which are fed into Layer 1. The events are convolved with exponential kernels (d) to build event contexts from spatia length ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is matched, it produces an event (f). Even Stimulus Fig. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is presented to the ATIS camera (b) which produces ON an events (c) which are fed into Layer 1. The events are convolved with exponential kernels (d) to build event contexts from spatial receptive field o length ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is matched, it produces an event (f). Events from the N1 fe constitute the output of the layer (g). Each layer k (gray boxes) takes input from its previous layer and feeds the next by reproducing steps 1350 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. 7, JUL Event Camera 1350 IEEE TRANSACTIONS ON PATTERN ANALYS 1350 IEEE TRANSACTIONS ON PATTERN ANALYSIS AN 43
  44. 44. Fig. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is presented to the ATIS camera (b events (c) which are fed into Layer 1. The events are convolved with exponential kernels (d) to build event contexts from length ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is matched, it produces an event (f <latexit sha1_base64="IEy8eBhrIcA0pDsqgwmjW0T+92U=">AAACy3icjVHLSsNAFD3GV62vqks3wSK4KklRdFl040aoYB/QljJJpzU0TUJmItTq0h9wq/8l/oH+hXfGKahFdEKSM+eec2fuvV4SBkI6zuucNb+wuLScW8mvrq1vbBa2tusizlKf1/w4jNOmxwQPg4jXZCBD3kxSzkZeyBve8EzFGzc8FUEcXclxwjsjNoiCfuAzSVSzLVlmd8v5bqHolBy97FngGlCEWdW48II2eojhI8MIHBEk4RAMgp4WXDhIiOtgQlxKKNBxjnvkyZuRipOCETuk74B2LcNGtFc5hXb7dEpIb0pOG/vkiUmXElan2Tqe6cyK/S33ROdUdxvT3zO5RsRKXBP7l2+q/K9P1SLRx4muIaCaEs2o6nyTJdNdUTe3v1QlKUNCnMI9iqeEfe2c9tnWHqFrV71lOv6mlYpVe99oM7yrW9KA3Z/jnAX1csk9KjmXh8XKqRl1DrvYwwHN8xgVnKOKmp7jI57wbF1Ywrq17j6l1pzx7ODbsh4+AAjXkbA=</latexit> ⌧2 1350 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIG Fig. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is presented to the events (c) which are fed into Layer 1. The events are convolved with exponential kernels (d) to build ev length ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is matched, it prod constitute the output of the layer (g). Each layer k (gray boxes) takes input from its previous layer an The output of Layer k is presented between Layer k and k þ 1 ((g),(h),(i)). To compute event contexts 1350 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHIN Clustering 1350 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELL Fig. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is presented to the ATIS camera (b) which produces ON and OFF events (c) which are fed into Layer 1. The events are convolved with exponential kernels (d) to build event contexts from spatial receptive field of side- length ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is matched, it produces an event (f). Events from the N1 features constitute the output of the layer (g). Each layer k (gray boxes) takes input from its previous layer and feeds the next by reproducing steps (d)-(g). The output of Layer k is presented between Layer k and k þ 1 ((g),(h),(i)). To compute event contexts, each layer considers a receptive field of side- length ð2Rk þ 1Þ around each pixel. The event contexts are compared to the different features (represented as surfaces in the gray boxes as 1350 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. 7, JULY 2017 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. 7 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. 7, JULY 201 44
  45. 45. g. 4. View of the proposed hierarchical model. From left to right, a moving digit (a) is presented to the ATIS camera (b) which produces ON and OFF vents (c) which are fed into Layer 1. The events are convolved with exponential kernels (d) to build event contexts from spatial receptive field of side- ngth ð2R1 þ 1Þ. These contexts are clustered into N1 features (e). When a feature is matched, it produces an event (f). Events from the N1 features onstitute the output of the layer (g). Each layer k (gray boxes) takes input from its previous layer and feeds the next by reproducing steps (d)-(g). Clustering E TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. 7, JULY 2017 model. From left to right, a moving digit (a) is presented to the ATIS camera (b) which produces ON and OFF events are convolved with exponential kernels (d) to build event contexts from spatial receptive field of side- stered into N1 features (e). When a feature is matched, it produces an event (f). Events from the N1 features ch layer k (gray boxes) takes input from its previous layer and feeds the next by reproducing steps (d)-(g). <latexit sha1_base64="m9P+XndBFyGd43BPzGkrnmWWB64=">AAACy3icjVHLSsNAFD3GV62vqks3wSK4KokPdFl040aoYB/QljJJpzU0TcLMRKjVpT/gVv9L/AP9C++MKahFdEKSM+eec2fuvV4SBlI5zuuMNTs3v7CYW8ovr6yurRc2NmsyToXPq34cxqLhMcnDIOJVFaiQNxLB2dALed0bnOl4/YYLGcTRlRolvD1k/SjoBT5TRDVaiqV25yDfKRSdkmOWPQ3cDBSRrUpceEELXcTwkWIIjgiKcAgGSU8TLhwkxLUxJk4QCkyc4x558qak4qRgxA7o26ddM2Mj2uuc0rh9OiWkV5DTxi55YtIJwvo028RTk1mzv+Uem5z6biP6e1muIbEK18T+5Zso/+vTtSj0cGJqCKimxDC6Oj/Lkpqu6JvbX6pSlCEhTuMuxQVh3zgnfbaNR5radW+Zib8ZpWb13s+0Kd71LWnA7s9xToPafsk9KjmXh8XyaTbqHLaxgz2a5zHKOEcFVTPHRzzh2bqwpHVr3X1KrZnMs4Vvy3r4AAs4kbE=</latexit> ⌧3 45
  46. 46. IONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 39, NO. 7, JULY 2017 normalized, and Bhattacharyya distances respectively. The recognized class is the one with the smallest bar in each col- umn, and is marked with a star. Each character is presented once to the sensor in the order we gave to the classes so that perfect results correspond to stars only on the main diago- nal. In this experiment, all distances lead to a recognition performance of 100 percent. We ran a cross validation test by randomly choosing for each pattern which presentation would be used for learning (both the model and classifier), and the other is then used for testing. Every trial amongst the hundred we ran gave a recognition accuracy of 100 percent. This experiment is run on a dataset composed of objects with very distinct spatial characteristics. Because of that, it is the best one to try to interpret the features learnt by the architecture. Fig. 11 presents the activation of all three experiment, we use a dataset consisting of the faces of seven subjects. Each subject is moving his or her head to follow the same visual stimulus tracing out a square path on a computer monitor (see Fig. 12). The data is acquired by an ATIS camera [22]. Each subject has 20 presentations in the dataset of which one is used for training and the others for testing. We again use the same hierarchical system described previously with the following parameters: ! R1 ¼ 4; ! t1 ¼ 50 ms; ! N1 ¼ 8: ! KR ¼ 2; ! Kt ¼ 5; ! KN ¼ 2: Because the faces are bigger and more complex objects, we Fig. 9. Letters & Digits experiment: Pattern signatures for some of the input classes. For each letter and digit the trained histogram used as a signa- ture by the classifier is shown. The snapshot shows an accumulation of events from the sensor (White dots for ON events and black dots for OFF events). The histograms present the signatures: X-axis is the index of the feature, Y-axis is the number of activations of the feature during the stimu- lus presentation. The signatures of all the letters & digits are presented in the supplemental material, available online. LAGORCE ET AL.: HOTS: A HIERARCHY OF EVENT-BASED TIME-SURFACES FOR PATTERN RECOGNITION 1353 ving digit (a) is presented to the ATIS camera (b) which produces ON and OFF ponential kernels (d) to build event contexts from spatial receptive field of side- en a feature is matched, it produces an event (f). Events from the N1 features nput from its previous layer and feeds the next by reproducing steps (d)-(g). i)). To compute event contexts, each layer considers a receptive field of side- E INTELLIGENCE, VOL. 39, NO. 7, JULY 2017 Deeper 46
  47. 47. Deep Temporal Learning with Adaptive Temporal Feedback: Temporal Surfaces X. Lagorce, G. Orchard, F. Galluppi, B. E. Shi, R.B. Benosman, HOTS : A Hierarchy Of event- based Time-Surfaces for pattern recognition, 2016 July 11, IEEE Transaction on Pattern Analysis and Machine Intelligence 39(7) :1346-1359, doi :10.1109/TPAMI.2016.2574707 47
  48. 48. Deep Temporal Learning with Adaptive Temporal Feedback: Temporal Surfaces X. Lagorce, G. Orchard, F. Galluppi, B. E. Shi, R.B. Benosman, HOTS : A Hierarchy Of event- based Time-Surfaces for pattern recognition, 2016 July 11, IEEE Transaction on Pattern Analysis and Machine Intelligence 39(7) :1346-1359, doi :10.1109/TPAMI.2016.2574707 48
  49. 49. Computation Platfoms? Two tendencies Biomimetism Understandand Accelerate 49
  50. 50. From: Goi, E., Zhang, Q., Chen, X. et al. Perspective on photonic memristive neuromorphic computing. PhotoniX 1, 3 (2020). https://doi.org/10.1186/s43074-020-0001-6 Computation Platfoms? Replicate 50
  51. 51. https://wiki.ebrains.eu/bin/view/Collabs/ neuromorphic/BrainScaleS/ ~4 Million bio-realistic neurons 880 Million learning synapses 105 faster than real time 164 Giga-events/s (normal) 1 Tera-event/s (burst) several hundreds of kW [2010] Analog vs Digital SpiNNaker 1 millon processors 200 million million actions per second 51
  52. 52. Neuromorphic Computing [from The Scientist, 2019] Today’s digital neuromorphic hardware 52
  53. 53. I/O I/O memory Event-based Machine Learning Module Event-based Processing Communication AER Module Communication AER Module Communication AER Module I/O A Processing Solution Adapted to Event Data Events Vision Analog Digital LIDAR Radar Audio IMU • Architecture for Event per Event: programmable event-based computation and machine learning • Industry standard I/O and programmability • Scalable, enabling fast and cost-effective derivatives <10-100mW and up to 20 GigaEvents/s processing SPEAR- 1 understand 53
  54. 54. https://www.summerrobotics.ai 7 VII. RESULTS A. Experiment The method is applied on a scene containing two objects placed around one meter away from the system. We used a SPSM of 20⇥30 symbols and the pattern codification is done using the dutycycle of the signal. The neighborhood in the burst filter is set to 1 (3 ⇥ 3 pixels) and signal’s frequency is set to 20Hz for performance issues. The algorithm runs on MatLab and even if it could be heavily parallelized, it wasn’t the case on this experiment and real time wasn’t achieve. An image of accumulated outputs from filters is given in Fig. 9, colors in the image reflect the estimation of the dutycycle by each pixel associated filter. Point correspondance is performed by extracting the 3 ⇥ 3 dutycycle based codewords and as pattern’s dots are bigger than a single pixel, we perform a spatial averaging to obtain a sub-pixel position of the dot in the camera frame. After triangulation, a 3D point cloud is obtained (Fig. 10) Fig. 9. Image of the projected pattern as extracted by the algorithm. Colors correspond to the estimated dutycycle. Fig. 10. 3D point cloud. VIII. CONCLUSION AND DISCUSSION This paper introduced a methodology that makes use of the hguh temporal resolution of the event based sensor to rethink Fig. 11. Reconstructed depth a unique spatial distribution of light patterns composed of elements each flickering at a unique frequency. We also used the idea of random shift allowing each neighboorhood to be unique. Each methodology relying on the combined use of an event based camera and a light projector coding each spatial position in the frequency domain can make use of the devel- oped approach. There are several ways to decode frequencies from the output of the event based camera. The method used here could be bettered, using more precise event based camera allowing fequency to be extracted from the timing of the oscillation of each pattern from inter spike information. If that condition is fulfilled even simpler patterns could be used without the need to rely on unique neighborhoods. REFERENCES [1] A. D. L. Escalera, L. Moreno, M. A. Salichs, and J. M. Armingol, “Continuous mobile robot localization by using structured light and a geometric map,” International Journal of Systems Science, vol. 27, no. 8, pp. 771–782, 1996. [Online]. Available: http://www.tandfonline.com/doi/abs/10.1080/00207729608929276 [2] M. Asada, “Map building for a mobile robot from sensory data,” Systems, Man and Cybernetics, IEEE Transactions on, vol. 20, no. 6, pp. 1326–1336, 1990. [3] Y. Kakinoki, T. Koezuka, S. Hashinami, and M. Nakashima, “Wide- area, high dynamic range 3-d imager,” in Proc. SPIE 1194, Optics, Illumination, and Image Sensing for Machine Vision IV, 1990. [4] R. T. Chin, “Automated visual inspection: 1981 to 1987,” Computer Vision, Graphics, and Image Processing, vol. 41, no. 3, pp. 346–381, 1988. [Online]. Available: http://www.sciencedirect.com/science/article/ pii/0734189X88901089 [5] E. M. Petriu, Z. Sakr, H. J. W. Spoelder, and A. Moica, “Object recognition using pseudo-random color encoded structured light,” in Instrumentation and Measurement Technology Conference, 2000. IMTC 2000. Proceedings of the 17th IEEE, vol. 3. IEEE, 2000, pp. 1237–1241. [Online]. Available: http://ieeexplore.ieee.org/xpls/abs all. jsp?arnumber=848675 [6] J. Park, G. N. DeSouza, and A. C. Kak, “Dual-beam structured-light scanning for 3-d object modeling,” in 3-D Digital Imaging and Modeling, 2001. Proceedings. Third International Conference on. IEEE, 2001, pp. 65–72. [Online]. Available: http://ieeexplore.ieee.org/ xpls/abs all.jsp?arnumber=924399 [7] D. Page, A. Koschan, Y. Sun, and M. Abidi, “Laser-based imaging for reverse engineering,” Sensor Review, vol. 23, no. 3, pp. 223–229, 2003. [Online]. Available: http://www.emeraldinsight.com/journals.htm? issn=0260-2288&volume=23&issue=3&articleid=876385&articletitle= Laser-based+imaging+for+reverse+engineering& [8] R. Klette, K. Schlüns, and A. Koschan, Computer obtained (Fig. 10) Fig. 9. Image of the projected pattern as extracted by the algorithm. Colors correspond to the estimated dutycycle. Fig. 10. 3D point cloud. VIII. CONCLUSION AND DISCUSSION This paper introduced a methodology that makes use of the hguh temporal resolution of the event based sensor to rethink the problem of structured light depth reconstruction. We used elements each flickering at the idea of random shift all unique. Each methodology r event based camera and a l position in the frequency do oped approach. There are se from the output of the even here could be bettered, using allowing fequency to be e oscillation of each pattern that condition is fulfilled ev without the need to rely on REF [1] A. D. L. Escalera, L. Armingol, “Continuous mob light and a geometric m Science, vol. 27, no. 8, p http://www.tandfonline.com/d [2] M. Asada, “Map building Systems, Man and Cyberneti pp. 1326–1336, 1990. [3] Y. Kakinoki, T. Koezuka, S. area, high dynamic range 3 Illumination, and Image Sens [4] R. T. Chin, “Automated visu Vision, Graphics, and Image 1988. [Online]. Available: htt pii/0734189X88901089 [5] E. M. Petriu, Z. Sakr, H. recognition using pseudo-ran Instrumentation and Measure 2000. Proceedings of the 1237–1241. [Online]. Availa jsp?arnumber=848675 [6] J. Park, G. N. DeSouza, and scanning for 3-d object m Modeling, 2001. Proceeding IEEE, 2001, pp. 65–72. [On xpls/abs all.jsp?arnumber=92 [7] D. Page, A. Koschan, Y. Su for reverse engineering,” Sen 2003. [Online]. Available: htt issn=0260-2288&volume=23& Laser-based+imaging+for+rev [8] R. Klette, K. Schlün vision: three-dimensional Singapore, 1998, v Fig. 3. Experimental setup III. PATTERN CODIFICATION STRATEGIES To get the better out of the ATIS sensor, it is mandatory to choose a suitable pattern codification. In the event domain, the only information available is the polarity and timestamp of pixels, thus the correspondance problem can’t be solve by color or grey level coding. Moreover, the need for an asyn- chronous system impose that our pattern can be segmented in small independent patches that we could use to extract local information only when it is needed. This constraint has a direct impact on the spatial organisation of the pattern and direct methods like the one used in ?? must be discarded. In the next section we expand over three different methods that could be used to gather local data independently as well as being decodable without too much computation from the sensor’s output to ensure a decent reconstruction speed. The first method we will develop is based on the well known time-multiplexing binary coding (refs?) but encode the binary frames as a signal with set frequency and dutycycle, each element is then part of a spatial codification that can be decoded locally. Our second method aim to makeuseof spatio- temporal continuity of events generated from fast moving objects to create codewords based on spatial orientation of multiple moving dots. This approach differs only from the first in the way information is extracted from the sensor, but the decoding process is done the same way. Lastly, we developed an approach based on phase-shifting algorithms, but instead of evaluating the phase from several shifted patterns (ref), we continuously move a series of stripes and record precise time difference between stripe positions from an arbitrary origin, which gives us a robust measurement. This last method is Projector plane Camera plane Scene C1 p2 P C2 p1 Fig. 4. Structured light principle. A coded light pattern is projected onto scene that is observed by a camera. Decoding the pattern allows the match of paired points in the two views (p1, p2 ) and by triangulation of the r coming from optical centers C1 and C2 , the position of the real point is found. Colors in the picture are for clarity only and in our case refers different binary signal’s dutycycle for the first method or different mot orientation for the second. instead of being computed through a sequence of intens values. Fig. ?? shows the pattern spatial organisation, ea color refers to aspecific dutycycle. It uses aDebruijn sequen that is repeated for each line. Ideally we should have project full lines instead of small spots but practical limitations limit our choices. A periodic projection generates, at the pixel lev an event stream composed of successive ON and OFF eve bursts following the signal’s periodic behavior. The length a the number of events constituting each burst is defined by t contrast and illumination condition as well as the set of bias used to operate the sensor. In a perfect case, each edge of the signal would trigg only one event of the corresponding polarity for each pix In this situation, measuring the inter-event time betwe successive ON and OFF events would be enough to get correct estimation of the signal’s frequency and dutycyc However, because of small imprecisions in the attribution timestamps by the sensor’s arbiter and possible variations the global illumination, the number of events triggered at ea edge of the signal is not consistent. To ensure at least o event per period, we have two choices: increase the senso articlehasbe e nacce pte dfor inclus ioninafutureis s ueofthisjournal.Conte nt isfinal aspre s e nte d,withthee xce ptionofpagination. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS of light. By mixing the received signal of pt (t) at the in-phase receiver of nd by low-pass filtering the resulting frequency component fc is removed, l is obtained: 2παTdt + 2πfcTd − παT 2 d . (4) of the baseband signal is proportional d, its spectrum shows peaks at certain ding to the distance between the radar bjects (the receive signal being, in fact, of the form given by (4) when multiple fter the analog-to-digital conversion by discrete-time signal r[i] is found as i] = r(t = iTf ). (5) ADC sampling period Tf impacts the age dmax of the radar [21] dmax = c 2αTf . (6) ned as follows: α = Tc (7) duration and is the bandwidth of the he range resolution [21] dres = c 2 . (8) ange profile Rn[k] is defined as the n the magnitude of Rn[k] indicate the hile the phase evolution of Rn[k∗ ] for a between successive chirps represents the TABLE I DATASET CONTENT USED IN THE DEMONSTRATION EXPERIMENTS Fig. 1. Gesture acquisition setup. making them more sensitive to proper preprocessing and feature extraction. The radar parameters were set as follows: the number of ADC samples per chirp is 512, the number of chirps per frame is 192, and the time between chirps is Ts = 1.2 ms, while the chirp duration Tc is 41 µs. Therefore, the total duration for a frame capture is 238 ms and Tf = 80 ns. Fig. 1 shows the gesture acquisition setup with the antennas and the radar Radar and LIDAR and much more… 54
  55. 55. Sight Restoration: Prosthetics and Optogenetics • Development of Retina Stimulation Goggles • 3 generations of Retina Prosthetics • Asynchronous Retina Stimulation: Prosthetics and Optogenetics IRIS 2 PRIMA 55
  56. 56. Sight Restoration: Prosthetics and Optogenetics 56
  57. 57. & much more.. Décision making: game theory stock Market Low power Online decoding and classification Robotics ADAS Sensory Substitution Always on sensing Space awareness 57
  58. 58. Conclusions ? Where we came from… A Chance to Move Perception from Engineering to Basic Science! observe understand model application A Chance to ground Perception in Basic Science! • A whole new world to explore • A deep paradigm shift for Sensing & AI • Novel sensors to build • New adapted processing architectures to design 58
  59. 59. Conclusions • A whole new world to explore • A deep paradigm shift for Sensing & AI • Novel sensors to build • New adapted processing architectures to design Physiology - Models - Hardware - Prosthetics - Robotics - Computation 59

×