Natural Interaction for Augmented Reality Applications

3,848 views

Published on

Keynote talk giving by Mark Billinghurst from the HIT Lab NZ at the IVCNZ 2013 conference, November 28th 2013. The talk focuses on Natural Interaction with Augmented Reality applications using speech and gesture and demonstrates some of the projects in this area developed by the HIT Lab NZ.

Published in: Technology

Natural Interaction for Augmented Reality Applications

  1. 1. Natural Interaction for Augmented Reality Applications Mark Billinghurst mark.billinghurst@hitlabnz.org The HIT Lab NZ, University of Canterbury November 28th 2013
  2. 2. 1977 – Star Wars 1977 – Star Wars
  3. 3. Augmented Reality Definition   Defining Characteristics   Combines Real and Virtual Images -  Both can be seen at the same time   Interactive in real-time -  The virtual content can be interacted with   Registered in 3D -  Virtual objects appear fixed in space Azuma, R. T. (1997). A survey of augmented reality. Presence, 6(4), 355-385.
  4. 4. Augmented Reality Today
  5. 5. AR Interface Components Physical Elements Input Interaction Metaphor Virtual Elements Output   Key Question: How should a person interact with the Augmented Reality content?   Connecting physical and virtual with interaction
  6. 6. AR Interaction Metaphors   Information Browsing   View AR content   3D AR Interfaces   3D UI interaction techniques   Augmented Surfaces   Tangible UI techniques   Tangible AR   Tangible UI input + AR output
  7. 7. Tangible User Interfaces   Use physical objects to interact with digital content   Foreground   graspable user interface   Background   ambient interfaces Ishii, H., & Ullmer, B. (1997). Tangible bits: towards seamless interfaces between people, bits and atoms. In Proceedings of the ACM SIGCHI Conference on Human factors in computing systems (pp. 234-241). ACM.
  8. 8. TUI Benefits and Limitations   Pros   Physical objects make us smart   Objects aid collaboration   Objects increase understanding   Cons   Difficult to change object properties   Limited display capabilities – 2D view   Separation between object and display
  9. 9. Tangible AR Metaphor   AR overcomes limitation of TUIs   enhance display possibilities   merge task/display space   provide public and private views   TUI + AR = Tangible AR   Apply TUI methods to AR interface design
  10. 10. VOMAR Demo (Kato 2000)   AR Furniture Arranging   Elements + Interactions   Book: -  Turn over the page   Paddle: -  Push, shake, incline, hit, scoop Kato, H., Billinghurst, M., et al. 2000. Virtual Object Manipulation on a Table-Top AR Environment. In Proceedings of the International Symposium on Augmented Reality (ISAR 2000), Munich, Germany, 111--119.
  11. 11. Lessons Learned   Advantages   Intuitive interaction, ease of use   Full 6 DOF manipulation   Disadvantages   Marker based tracking -  occlusion, limited tracking range, etc   Needs external interface objects -  Paddle, book, etc
  12. 12. 2012 – Iron Man
  13. 13. To Make the Vision Real..   Hardware/software requirements   Contact lens displays   Free space hand/body tracking   Speech/gesture recognition   Etc..   Most importantly   Usability/User Experience
  14. 14. Natural Interaction   Automatically detecting real environment   Environmental awareness, Physically based interaction   Gesture interaction   Free-hand interaction   Multimodal input   Speech and gesture interaction   Intelligent interfaces   Implicit rather than Explicit interaction
  15. 15. Environmental Awareness
  16. 16. AR MicroMachines   AR experience with environment awareness and physically-based interaction   Based on MS Kinect RGB-D sensor   Augmented environment supports   occlusion, shadows   physically-based interaction between real and virtual objects Clark, A., & Piumsomboon, T. (2011). A realistic augmented reality racing game using a depth-sensing camera. In Proceedings of the 10th International Conference on Virtual Reality Continuum and Its Applications in Industry (pp. 499-502). ACM.
  17. 17. Operating Environment
  18. 18. Architecture   Our framework uses five libraries:   OpenNI   OpenCV   OPIRA   Bullet Physics   OpenSceneGraph
  19. 19. System Flow   The system flow consists of three sections:   Image Processing and Marker Tracking   Physics Simulation   Rendering
  20. 20. Physics Simulation   Create virtual mesh over real world   Update at 10 fps – can move real objects   Use by physics engine for collision detection (virtual/real)   Use by OpenScenegraph for occlusion and shadows
  21. 21. Rendering Occlusion Shadows
  22. 22. Gesture Interaction
  23. 23. Natural Hand Interaction   Using bare hands to interact with AR content   MS Kinect depth sensing   Real time hand tracking   Physics based simulation model
  24. 24. Hand Interaction   Represent models as collections of spheres   Bullet physics engine for interaction with real world
  25. 25. Scene Interaction   Render AR scene with OpenSceneGraph   Using depth map for occlusion   Shadows yet to be implemented
  26. 26. Architecture 5. Gesture •  Static Gestures •  Dynamic Gestures •  Context based Gestures 4. Modeling •  Hand recognition/modeling •  Rigid-body modeling 3. Classification/Tracking 2. Segmentation 1. Hardware Interface
  27. 27. Architecture 5. Gesture •  Static Gestures •  Dynamic Gestures •  Context based Gestures o  Supports PCL, OpenNI, OpenCV, and Kinect SDK. o  Provides access to depth, RGB, XYZRGB. o  Usage: Capturing color image, depth image and concatenated point clouds from a single or multiple cameras o  For example: 4. Modeling •  Hand recognition/ modeling •  Rigid-body modeling 3. Classification/Tracking 2. Segmentation 1. Hardware Interface Kinect for Xbox 360 Kinect for Windows Asus Xtion Pro Live
  28. 28. Architecture 5. Gesture •  Static Gestures •  Dynamic Gestures •  Context based Gestures o  Segment images and point clouds based on color, depth and space. o  Usage: Segmenting images or point clouds using color models, depth, or spatial properties such as location, shape and size. o  For example: 4. Modeling •  Hand recognition/ modeling •  Rigid-body modeling Skin color segmentation 3. Classification/Tracking 2. Segmentation 1. Hardware Interface Depth threshold
  29. 29. Architecture 5. Gesture •  Static Gestures •  Dynamic Gestures •  Context based Gestures o  Identify and track objects between frames based on XYZRGB. o  Usage: Identifying current position/orientation of the tracked object in space. o  For example: 4. Modeling •  Hand recognition/ modeling •  Rigid-body modeling 3. Classification/Tracking 2. Segmentation 1. Hardware Interface Training set of hand poses, colors represent unique regions of the hand. Raw output (withoutcleaning) classified on real hand input (depth image).
  30. 30. Architecture 5. Gesture •  Static Gestures •  Dynamic Gestures •  Context based Gestures 4. Modeling •  Hand recognition/ modeling •  Rigid-body modeling 3. Classification/Tracking 2. Segmentation 1. Hardware Interface o  Hand Recognition/Modeling   Skeleton based (for low resolution approximation)   Model based (for more accurate representation) o  Object Modeling (identification and tracking rigidbody objects) o  Physical Modeling (physical interaction)   Sphere Proxy   Model based   Mesh based o  Usage: For general spatial interaction in AR/VR environment
  31. 31. Architecture 5. Gesture •  Static Gestures •  Dynamic Gestures •  Context based Gestures 4. Modeling •  Hand recognition/ modeling •  Rigid-body modeling 3. Classification/Tracking 2. Segmentation 1. Hardware Interface o  Static (hand pose recognition) o  Dynamic (meaningful movement recognition) o  Context-based gesture recognition (gestures with context, e.g. pointing) o  Usage: Issuing commands/anticipating user intention and high level interaction.
  32. 32. Skeleton Based Interaction   3 Gear Systems   Kinect/Primesense Sensor   Two hand tracking   http://www.threegear.com
  33. 33. Skeleton Interaction + AR   HMD AR View   Viewpoint tracking   Two hand input   Skeleton interaction, occlusion
  34. 34. Multimodal Input
  35. 35. Multimodal Interaction   Combined speech input   Gesture and Speech complimentary   Speech -  modal commands, quantities   Gesture -  selection, motion, qualities   Previous work found multimodal interfaces intuitive for 2D/3D graphics interaction
  36. 36. Free Hand Multimodal Input Point Move Pick/Drop   Use free hand to interact with AR content   Recognize simple gestures Lee, M., Billinghurst, M., Baek, W., Green, R., & Woo, W. (2013). A usability study of multimodal input in an augmented reality environment. Virtual Reality, 17(4), 293-305.
  37. 37. Multimodal Architecture
  38. 38. Multimodal Fusion
  39. 39. Hand Occlusion
  40. 40. Experimental Setup Change object shape and colour
  41. 41. User Evaluation   Change object shape, colour and position   Conditions   Speech only, gesture only, multimodal   Measure   performance time, error, subjective survey
  42. 42. Results   Average performance time (MMI, speech fastest)   Gesture: 15.44s   Speech: 12.38s   Multimodal: 11.78s   No difference in user errors   User subjective survey   Q1: How natural was it to manipulate the object? -  MMI, speech significantly better   70% preferred MMI, 25% speech only, 5% gesture only
  43. 43. Intelligent Interfaces
  44. 44. Intelligent Interfaces   Most AR systems stupid   Don’t recognize user behaviour   Don’t provide feedback   Don’t adapt to user   Especially important for training   Scaffolded learning   Moving beyond check-lists of actions
  45. 45. Intelligent Interfaces   AR interface + intelligent tutoring system   ASPIRE constraint based system (from UC)   Constraints -  relevance cond., satisfaction cond., feedback Westerfield, G., Mitrovic, A., & Billinghurst, M. (2013). Intelligent Augmented Reality Training for Assembly Tasks. In Artificial Intelligence in Education (pp. 542-551). Springer Berlin Heidelberg.
  46. 46. Domain Ontology
  47. 47. Intelligent Feedback   Actively monitors user behaviour   Implicit vs. explicit interaction   Provides corrective feedback
  48. 48. Evaluation Results   16 subjects, with and without ITS   Improved task completion   Improved learning
  49. 49. Intelligent Agents   AR characters   Virtual embodiment of system   Multimodal input/output   Examples   AR Lego, Welbo, etc   Mr Virtuoso -  AR character more real, more fun -  On-screen 3D and AR similar in usefulness Wagner, D., Billinghurst, M., & Schmalstieg, D. (2006). How real should virtual characters be?. In Proceedings of the 2006 ACM SIGCHI international conference on Advances in computer entertainment technology (p. 57). ACM.
  50. 50. Looking to the Future What’s Next?
  51. 51. Directions for Future Research   Mobile Gesture Interaction   Tablet, phone interfaces   Wearable Systems   Google Glass   Novel Displays   Contact lens
  52. 52. Mobile Gesture Interaction   Motivation   Richer interaction with handheld devices   Natural interaction with handheld AR   2D tracking   Finger tip tracking   3D tracking [Hurst and Wezel 2013]   Hand tracking [Henrysson et al. 2007] Henrysson, A., Marshall, J., & Billinghurst, M. (2007). Experiments in 3D interaction for mobile phone AR. In Proceedings of the 5th international conference on Computer graphics and interactive techniques in Australia and Southeast Asia (pp. 187-194). ACM.
  53. 53. Fingertip Based Interaction Running System System Setup Mobile Client + PC Server Bai, H., Gao, L., El-Sana, J., & Billinghurst, M. (2013). Markerless 3D gesture-based interaction for handheld augmented reality interfaces. In SIGGRAPH Asia 2013 Symposium on Mobile Graphics and Interactive Applications (p. 22). ACM.
  54. 54. System Architecture
  55. 55. 3D Prototype System   3 Gear + Vuforia   Hand tracking + phone tracking   Freehand interaction on phone   Skeleton model   3D interaction   20 fps performance
  56. 56. Google Glass
  57. 57. User Experience   Truly Wearable Computing   Less than 46 ounces   Hands-free Information Access   Voice interaction, Ego-vision camera   Intuitive User Interface   Touch, Gesture, Speech, Head Motion   Access to all Google Services   Map, Search, Location, Messaging, Email, etc
  58. 58. Contact Lens Display   Babak Parviz   University Washington   MEMS components   Transparent elements   Micro-sensors   Challenges   Miniaturization   Assembly   Eye-safe
  59. 59. Contact Lens Prototype
  60. 60. Conclusion
  61. 61. Conclusions   AR experiences need new interaction methods   Enabling technologies are advancing quickly   Displays, tracking, depth capture devices   Natural user interfaces possible   Free hand gesture, speech, intelligence interfaces   Important research for the future   Mobile, wearable, displays
  62. 62. More Information •  Mark Billinghurst –  Email: mark.billinghurst@hitlabnz.org –  Twitter: @marknb00 •  Website –  http://www.hitlabnz.org/

×