Natural Interfaces for Augmented Reality


Published on

Natural User Interfaces for Augmented Reality. Keynote speech given by Mark Billinghurst at the CHINZ 2012 conference, in Dunedin, July 2nd 2012.

Published in: Technology
1 Comment
No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • - To create an interaction volume, the Kinect is positioned above the desired interaction space facing downwards. - A reference marker is placed in the interaction space to calculate the transform between the Kinect coordinate system and the coordinate system used by the AR viewing camera. - Users can also wear color markers on their fingers for pre-defined gesture interaction.
  • - The OpenSceneGraph framework is used for rendering. The input video image is rendered as the background, with all the virtual objects rendered on top. - At the top level of the scene graph, the viewing transformation is applied such that all virtual objects are transformed so as to appear attached to the real world. - The trimesh is rendered as an array of quads, with an alpha value of zero. This allows realistic occlusion effects of the terrain and virtual objects, while not affecting the users’ view of the real environment. - A custom fragment shader was written to allow rendering of shadows to the invisible terrain.
  • Appearance-based interaction has been used at the Lab before, both in AR Micromachines and PhobiAR. Flaws in these applications have motivated my work on advanced tracking and modeling. AR Micromachines did not allow for dynamic interaction – a car could be picked up, but because the motion of the hand was not known, friction could not be simulated between the car and the hand. PhobiAR introduced tracking for dynamic interaction, but it really only tracked objects in 2D. I’ll show you what I mean.. As soon as the hand is flipped the tracking fails and the illusion of realistic interaction is broken. 3D tracking was required to make the interaction in both of these applications more realistic
  • Another issue with typical AR applications is the handling of occlusion. The Kinect allows a model of the environment to be developed, which can help in determining whether a real object is in front of a virtual one. Micromachines had good success by assuming a situation such as that shown on the right, with all objects in the scene in contact with the ground. This was a fair assumption when most of the objects were books etc. However, in PhobiAR the user’s hands were often above the ground, more like the scene on the left. The thing to notice is that these two scenes are indistinguishable from the Kinect’s point of view, but completely different from the observer’s point of view. The main problem is that we don’t know enough about the shape real-world objects to handle occlusion properly. My work aims to model real-world objects by combining views of the objects across multiple frames, allowing better occlusion.
  • The gesture library will provide a C++ API for real-time recognition and tracking of hands and rigid-body objects in 3D environments. The library will support usage of single and multiple depth sensing cameras. Collision detection and physics simulation will be integrated for realistic physical interaction. Finally, learning algorithms will be implemented for recognizing hand gestures.
  • The library will support usage of single and multiple depth sensing cameras. Aim for general consumer hardware.
  • Interaction between real objects and the virtual balls was achieved by representing objects as collections of spheres. The location of the spheres was determined by the modeling stage while their motion was found during tracking. I used the Bullet physics engine for physics simulation.
  • The AR scene was rendered using OpenSceneGraph. Because the Kinect’s viewpoint was also the user’s viewpoint, realistic occlusion was possible using the Kinect’s depth data. I did not have time to experiment with using the object models to improve occlusion from other viewpoints. Also, the addition of shadows could have significantly improved the realism of the application.
  • Natural Interfaces for Augmented Reality

    1. Natural Interfaces for Augmented Reality Mark Billinghurst HIT Lab NZ University of Canterbury
    2. Augmented Reality Definition Defining Characteristics [Azuma 97]  Combines Real and Virtual Images - Both can be seen at the same time  Interactive in real-time - The virtual content can be interacted with  Registered in 3D - Virtual objects appear fixed in space
    3. AR Today Most widely used AR is mobile or web based Mobile AR  Outdoor AR (GPS + compass) - Layar (10 million+ users), Junaio, etc  Indoor AR (image based tracking) - QCAR, String etc Web based (Flash)  Flartoolkit marker tracking  Markerless tracking
    4. AR Interaction You can see spatially registered AR.. how can you interact with it?
    5. AR Interaction Today Mostly simple interaction Mobile  Outdoor (Junaio, Layar, Wikitude, etc) - Viewing information in place, touch virtual tags  Indoor (Invizimals, Qualcomm demos) - Change viewpoint, screen based (touch screen) Web based  Change viewpoint, screen interaction (mouse)
    6. History of AR Interaction
    7. 1. AR Information Viewing Information is registered to real-world context  Hand held AR displays Interaction  Manipulation of a window into information space  2D/3D virtual viewpoint control Applications  Context-aware information displays Examples NaviCam Rekimoto, et al. 1997  NaviCam, Cameleon, etc
    8. Current AR Information Browsers Mobile AR  GPS + compass Many Applications  Layar  Wikitude  Acrossair  PressLite  Yelp  AR Car Finder  …
    9. 2. 3D AR Interfaces Virtual objects displayed in 3D physical space and manipulated  HMDs and 6DOF head-tracking  6DOF hand trackers for input Interaction  Viewpoint control  Traditional 3D UI interaction: Kiyokawa, et al. 2000 manipulation, selection, etc. Requires custom input devices
    10. VLEGO - AR 3D Interaction
    11. 3. Augmented Surfaces andTangible Interfaces Basic principles  Virtual objects are projected on a surface  Physical objects are used as controls for virtual objects  Support for collaboration
    12. Augmented Surfaces Rekimoto, et al. 1998  Front projection  Marker-based tracking  Multiple projection surfaces
    13. Tangible User Interfaces (Ishii 97) Create digital shadows for physical objects Foreground  graspable UI Background  ambient interfaces
    14. Tangible Interface: ARgroove Collaborative Instrument Exploring Physically Based Interaction  Move and track physical record  Map physical actions to Midi output - Translation, rotation - Tilt, shake Limitation  AR output shown on screen  Separation between input and output
    15. Lessons from Tangible Interfaces Benefits  Physical objects make us smart (affordances, constraints)  Objects aid collaboration (shared meaning)  Objects increase understanding (cognitive artifacts) Limitations  Difficult to change object properties  Limited display capabilities (project onto surface)  Separation between object and display
    16. 4: Tangible AR AR overcomes limitation of TUIs  enhance display possibilities  merge task/display space  provide public and private views TUI + AR = Tangible AR  Apply TUI methods to AR interface design
    17. Example Tangible AR Applications Use of natural physical object manipulations to control virtual objects LevelHead (Oliver)  Physical cubes become rooms VOMAR (Kato 2000)  Furniture catalog book: - Turn over the page to see new models  Paddle interaction: - Push, shake, incline, hit, scoop
    18. VOMAR Interface
    19. Evolution of AR Interaction1. Information Viewing Interfaces  simple (conceptually!), unobtrusive§ 3D AR Interfaces  expressive, creative, require attention§ Tangible Interfaces  Embedded into conventional environments4. Tangible AR  Combines TUI input + AR display
    20. Limitations Typical limitations  Simple/No interaction (viewpoint control)  Require custom devices  Single mode interaction  2D input for 3D (screen based interaction)  No understanding of real world  Explicit vs. implicit interaction  Unintelligent interfaces (no learning)
    21. Natural Interaction
    22. The Vision of AR
    23. To Make the Vision Real.. Hardware/software requirements  Contact lens displays  Free space hand/body tracking  Environment recognition  Speech/gesture recognition  Etc..
    24. Natural Interaction Automatically detecting real environment  Environmental awareness  Physically based interaction Gesture Input  Free-hand interaction Multimodal Input  Speech and gesture interaction  Implicit rather than Explicit interaction
    25. Environmental Awareness
    26. AR MicroMachines AR experience with environment awareness and physically-based interaction  Based on MS Kinect RGB-D sensor Augmented environment supports  occlusion, shadows  physically-based interaction between real and virtual objects
    27. Operating Environment
    28. Architecture Our framework uses five libraries:  OpenNI  OpenCV  OPIRA  Bullet Physics  OpenSceneGraph
    29. System Flow The system flow consists of three sections:  Image Processing and Marker Tracking  Physics Simulation  Rendering
    30. Physics Simulation Create virtual mesh over real world Update at 10 fps – can move real objects Use by physics engine for collision detection (virtual/real) Use by OpenScenegraph for occlusion and shadows
    31. RenderingOcclusion Shadows
    32. Natural Gesture Interaction HIT Lab NZ AR Gesture Library
    33. Motivation AR MicroMachines and PhobiAR • Treated the environment as static – no tracking • Tracked objects in 2DMore realistic interaction requires 3D gesture tracking
    34. Motivation Occlusion IssuesAR MicroMachines only achieved realistic occlusion because the user’s viewpoint matched the Kinect’sProper occlusion requires a more complete model of scene objects
    35. HITLabNZ’s Gesture LibraryArchitecture
    36. HITLabNZ’s Gesture LibraryArchitecture o Supports PCL, OpenNI, OpenCV, and Kinect SDK. o Provides access to depth, RGB, XYZRGB. o Usage: Capturing color image, depth image and concatenated point clouds from a single or multiple cameras o For example: Kinect for Xbox 360 Kinect for Windows Asus Xtion Pro Live
    37. HITLabNZ’s Gesture LibraryArchitecture o Segment images and point clouds based on color, depth and space. o Usage: Segmenting images or point clouds using color models, depth, or spatial properties such as location, shape and size. o For example: Skin color segmentation Depth threshold
    38. HITLabNZ’s Gesture LibraryArchitecture o Identify and track objects between frames based on XYZRGB. o Usage: Identifying current position/orientation of the tracked object in space. o For example: Training set of hand poses, colors represent unique regions of the hand. Raw output (without- cleaning) classified on real hand input (depth image).
    39. HITLabNZ’s Gesture LibraryArchitecture o Hand Recognition/Modeling  Skeleton based (for low resolution approximation)  Model based (for more accurate representation) o Object Modeling (identification and tracking rigid- body objects) o Physical Modeling (physical interaction)  Sphere Proxy  Model based  Mesh based o Usage: For general spatial interaction in AR/VR environment
    40. MethodRepresent models as collections of spheres moving with the models in the Bullet physics engine
    41. MethodRender AR scene with OpenSceneGraph, using depth map for occlusion Shadows yet to be implemented
    42. Results
    43. HITLabNZ’s Gesture LibraryArchitecture o Static (hand pose recognition) o Dynamic (meaningful movement recognition) o Context-based gesture recognition (gestures with context, e.g. pointing) o Usage: Issuing commands/anticipating user intention and high level interaction.
    44. Multimodal Interaction
    45. Multimodal Interaction Combined speech input Gesture and Speech complimentary  Speech - modal commands, quantities  Gesture - selection, motion, qualities Previous work found multimodal interfaces intuitive for 2D/3D graphics interaction
    46. 1. Marker Based Multimodal Interface  Add speech recognition to VOMAR  Paddle + speech commands
    47. Commands Recognized Create Command "Make a blue chair": to create a virtual object and place it on the paddle. Duplicate Command "Copy this": to duplicate a virtual object and place it on the paddle. Grab Command "Grab table": to select a virtual object and place it on the paddle. Place Command "Place here": to place the attached object in the workspace. Move Command "Move the couch": to attach a virtual object in the workspace to the paddle so that it follows the paddle movement.
    48. System Architecture
    49. Object Relationships"Put chair behind the table”Where is behind? View specific regions
    50. User Evaluation Performance time  Speech + static paddle significantly faster Gesture-only condition less accurate for position/orientation Users preferred speech + paddle input
    51. Subjective Surveys
    52. 2. Free Hand Multimodal Input Use free hand to interact with AR content Recognize simple gestures No marker tracking Point Move Pick/Drop
    53. Multimodal Architecture
    54. Multimodal Fusion
    55. Hand Occlusion
    56. User Evaluation Change object shape, colour and position Conditions  Speech only, gesture only, multimodal Measure  performance time, error, subjective survey
    57. Experimental SetupChange object shape and colour
    58. Results Average performance time (MMI, speech fastest)  Gesture: 15.44s  Speech: 12.38s  Multimodal: 11.78s No difference in user errors User subjective survey  Q1: How natural was it to manipulate the object? - MMI, speech significantly better  70% preferred MMI, 25% speech only, 5% gesture only
    59. Future Directions
    60. Future Research Mobile real world capture Mobile gesture input Intelligent interfaces Virtual characters
    61. Natural Gesture Interaction on Mobile Use mobile camera for hand tracking  Fingertip detection
    62. Evaluation Gesture input more than twice as slow as touch No difference in naturalness
    63. Intelligent Interfaces Most AR systems stupid  Don’t recognize user behaviour  Don’t provide feedback  Don’t adapt to user Especially important for training  Scaffolded learning  Moving beyond check-lists of actions
    64. Intelligent Interfaces AR interface + intelligent tutoring system  ASPIRE constraint based system (from UC)  Constraints - relevance cond., satisfaction cond., feedback
    65. Domain Ontology
    66. Intelligent Feedback Actively monitors user behaviour  Implicit vs. explicit interaction Provides corrective feedback
    67. Evaluation Results 16 subjects, with and without ITS Improved task completion Improved learning
    68. Intelligent Agents AR characters  Virtual embodiment of system  Multimodal input/output Examples  AR Lego, Welbo, etc  Mr Virtuoso - AR character more real, more fun - On-screen 3D and AR similar in usefulness
    69. Conclusions
    70. Conclusions AR traditionally involves tangible interaction New technologies support natural interaction  Environment capture  Natural gestures  Multimodal interaction Opportunities for future research  Mobile, intelligent systems, characters
    71. More Information• Mark Billinghurst –• Website –