27 3 d scene accesibility for the blind via

698 views

Published on

Published in: Business, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
698
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

27 3 d scene accesibility for the blind via

  1. 1. 3D Scene Accessibility For The Blind Via Auditory-Multitouch Interfaces Juan D. Gomez, Sinan Mohammed, Guido Bologna and Thierry Pun UNIVERSITY OF GENEVA, COMPUTER VISION & MULTIMEDIA LAB CVML University Computer vision & of Geneva Multimedia Lab 28-30 November 2011 in Brussels, Belgium
  2. 2. “Object Detection”The annual PASCAL Visual Objects Challenge
  3. 3. “Object Detection”The annual PASCAL Visual Objects Challenge
  4. 4. V. Hedau, D. Hoiem, D.Forsyth,“Recovering the Spatial Layout of Cluttered Rooms” IEEE International Conference on Computer Vision (ICCV), 2009.
  5. 5. S.Y. Bao, M. Sun, S.Savarese. “Coherent Object Detection And Scene Layout Understanding”IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2010.
  6. 6. Gomez, J., Mohammed, S., Bologna, G. and Pun, T.“Toward 3D Scene Understanding via Audio-description: Kinect-iPad fusion for the visually impaired” International Conference on Computers and Accessibility (ASSETS), 2011. Preliminary Target Scene Triangle Circle Square Cylinder 40 cm
  7. 7. Gomez, J., Bologna, G. and Pun, T. “A virtual ceiling mounted depth-camera using orthographic kinect ” IEEE International Conference on Computer Vision (ICCV), 2011.One-Shot Semiautomatic Kinect CalibrationBefore Calibration After Calibration
  8. 8. Gomez, J., Mohammed, S., Bologna, G. and Pun, T. “Toward 3D Scene Understanding via Audio-description: Kinect-iPad fusion for the visually impaired” International Conference on Computers and Accessibility (ASSETS), 2011.Elements Extraction Via Depth-Based Segmentation Layers in which an object was detected after scanningLayering across the Depth Objectless Image
  9. 9. Gomez, J., Mohammed, S., Bologna, G. and Pun, T.“Toward 3D Scene Understanding via Audio-description: Kinect-iPad fusion for the visually impaired” International Conference on Computers and Accessibility (ASSETS), 2011. Neural-Based Object Recognition 4 features per Object: Features’ values range from 0 to 1. [0,1]. Weights equal to 1, features are of same importance. All features are scale-invariant. All features are rotation-invariant. | 1 – (majorAxisLength – minorAxisLength) / majorAxisLength | perimeter / (majorAxisLength* pi) | ((pi * Radius2 )-area) / area | | 1 - | pi*majorAxisLength – perimeter | / perimeter |
  10. 10. Gomez, J., Mohammed, S., Bologna, G. and Pun, T. “Toward 3D Scene Understanding via Audio-description: Kinect-iPad fusion for the visually impaired” International Conference on Computers and Accessibility (ASSETS), 2011. Early Scenary Description So far: Frontal-view gives just relative layout understanding. A top-view of the scene is quite desirable to grasp scene distribution.Wheras frontal distances (depths) are known, lateral distances are still missed. How to deliver all this information to the blind user?
  11. 11. Gomez, J., Mohammed, S., Bologna, G. and Pun, T. “Toward 3D Scene Understanding via Audio-description: Kinect-iPad fusion for the visually impaired” International Conference on Computers and Accessibility (ASSETS), 2011.Delivering Visual Information via Finger-Triggered Audio Natural Top-view of the scene Artificial Top-view of the scene Traget sensation to be achieved onto iPad iPad holding Artificial Top-view Target sensation of Spatial Audio
  12. 12. Gomez, J., Bologna, G. and Pun, T. “A virtual ceiling mounted depth-camera using orthographic kinect ” IEEE International Conference on Computer Vision (ICCV), 2011.Deceptive Object Location Caused by Perspective Causes Mistaken Spatial Sonification And Top-View is Unreacheble despite Depth Vanishing Point and Scene Optical Geometry Example
  13. 13. Gomez, J., Bologna, G. and Pun, T. “A virtual ceiling mounted depth-camera using orthographic kinect ” IEEE International Conference on Computer Vision (ICCV), 2011. Orthographic Vs Perspective CamerasA perspective camera (bottom-right): Objects further away appear smaller in size, besides the positions vary with the distance. An orthographic camera (top-left): Objects preserve natural proportions on size and position.
  14. 14. Gomez, J., Bologna, G. and Pun, T. “A virtual ceiling mounted depth-camera using orthographic kinect ” IEEE International Conference on Computer Vision (ICCV), 2011.Top-View Based on Virtual Orthographic Cam
  15. 15. Gomez, J., Bologna, G. and Pun, T. “A virtual ceiling mounted depth-camera using orthographic kinect ” IEEE International Conference on Computer Vision (ICCV), 2011.Top-View Based on Virtual Orthographic Cam
  16. 16. Gomez, J., Bologna, G. and Pun, T. “A virtual ceiling mounted depth-camera using orthographic kinect ” IEEE International Conference on Computer Vision (ICCV), 2011. Top-View Based on Virtual Orthographic Cam Artificial Top-view using virtual orthographic Kinect andNatural depth map from avobe using virtual orthographic Kinect Object recognition methods.
  17. 17. Gomez, J., Mohammed, S., Bologna, G. and Pun, T. “Scene accessibility for the blind via computer-vision and multi-touch interfaces” Conference on Open Accessibility Everywhere (AEGIS), 2011.Experiments With Blinfoleded Users Original Layout User Guess Centroids Shifting
  18. 18. Gomez, J., Mohammed, S., Bologna, G. and Pun, T. “Scene accessibility for the blind via computer-vision and multi-touch interfaces” Conference on Open Accessibility Everywhere (AEGIS), 2011. Results X axis represents 30 different scenes with four elements each. Y axis represents the average of the distances (cm) between the original and the final location of the four objects. This average distance has been normalized dividing its value by the diagonal (244 cm).The colors of the bars (scenes) vary according to their exploration time that goes from 0 to 10 minutes (colormap). Each bar shows on top the standard deviation of the four elements’ relocation.
  19. 19. Gomez, J., Mohammed, S., Bologna, G. and Pun, T. “Scene accessibility for the blind via computer-vision and multi-touch interfaces” Conference on Open Accessibility Everywhere (AEGIS), 2011. Conclusions The mean error distance on objects’ replacement for all the experiments was 3.3% with respect to the diagonal of the table. This is around 8.5 cm of separation between an original object position and its relocation. In both cases i.e. scenes with three and four objects, this distance remained more or less invariant. The exploration time varied according the number of elements on the table. In average for a scene composed of three elements, 3.4 minutes were enough to buildits layout in mind, whereas for scenes with four elements this time reached 5.4 minutes.This difference was given due to the increase in the number of sound-colors associations to be learned; the results showed no misclassifications of objects though. The results presented in this work reveal that the participants were capable of grasping general spatial structure of the sonified environments and accurately estimate scene layouts.

×