Natural Interfaces for Augmented Reality

  • 3,105 views
Uploaded on

Natural User Interfaces for Augmented Reality. Keynote speech given by Mark Billinghurst at the CHINZ 2012 conference, in Dunedin, July 2nd 2012.

Natural User Interfaces for Augmented Reality. Keynote speech given by Mark Billinghurst at the CHINZ 2012 conference, in Dunedin, July 2nd 2012.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
3,105
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
114
Comments
1
Likes
8

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • - To create an interaction volume, the Kinect is positioned above the desired interaction space facing downwards. - A reference marker is placed in the interaction space to calculate the transform between the Kinect coordinate system and the coordinate system used by the AR viewing camera. - Users can also wear color markers on their fingers for pre-defined gesture interaction.
  • - The OpenSceneGraph framework is used for rendering. The input video image is rendered as the background, with all the virtual objects rendered on top. - At the top level of the scene graph, the viewing transformation is applied such that all virtual objects are transformed so as to appear attached to the real world. - The trimesh is rendered as an array of quads, with an alpha value of zero. This allows realistic occlusion effects of the terrain and virtual objects, while not affecting the users’ view of the real environment. - A custom fragment shader was written to allow rendering of shadows to the invisible terrain.
  • Appearance-based interaction has been used at the Lab before, both in AR Micromachines and PhobiAR. Flaws in these applications have motivated my work on advanced tracking and modeling. AR Micromachines did not allow for dynamic interaction – a car could be picked up, but because the motion of the hand was not known, friction could not be simulated between the car and the hand. PhobiAR introduced tracking for dynamic interaction, but it really only tracked objects in 2D. I’ll show you what I mean.. As soon as the hand is flipped the tracking fails and the illusion of realistic interaction is broken. 3D tracking was required to make the interaction in both of these applications more realistic
  • Another issue with typical AR applications is the handling of occlusion. The Kinect allows a model of the environment to be developed, which can help in determining whether a real object is in front of a virtual one. Micromachines had good success by assuming a situation such as that shown on the right, with all objects in the scene in contact with the ground. This was a fair assumption when most of the objects were books etc. However, in PhobiAR the user’s hands were often above the ground, more like the scene on the left. The thing to notice is that these two scenes are indistinguishable from the Kinect’s point of view, but completely different from the observer’s point of view. The main problem is that we don’t know enough about the shape real-world objects to handle occlusion properly. My work aims to model real-world objects by combining views of the objects across multiple frames, allowing better occlusion.
  • The gesture library will provide a C++ API for real-time recognition and tracking of hands and rigid-body objects in 3D environments. The library will support usage of single and multiple depth sensing cameras. Collision detection and physics simulation will be integrated for realistic physical interaction. Finally, learning algorithms will be implemented for recognizing hand gestures.
  • The library will support usage of single and multiple depth sensing cameras. Aim for general consumer hardware.
  • Interaction between real objects and the virtual balls was achieved by representing objects as collections of spheres. The location of the spheres was determined by the modeling stage while their motion was found during tracking. I used the Bullet physics engine for physics simulation.
  • The AR scene was rendered using OpenSceneGraph. Because the Kinect’s viewpoint was also the user’s viewpoint, realistic occlusion was possible using the Kinect’s depth data. I did not have time to experiment with using the object models to improve occlusion from other viewpoints. Also, the addition of shadows could have significantly improved the realism of the application.

Transcript

  • 1. Natural Interfaces for Augmented Reality Mark Billinghurst HIT Lab NZ University of Canterbury
  • 2. Augmented Reality Definition Defining Characteristics [Azuma 97]  Combines Real and Virtual Images - Both can be seen at the same time  Interactive in real-time - The virtual content can be interacted with  Registered in 3D - Virtual objects appear fixed in space
  • 3. AR Today Most widely used AR is mobile or web based Mobile AR  Outdoor AR (GPS + compass) - Layar (10 million+ users), Junaio, etc  Indoor AR (image based tracking) - QCAR, String etc Web based (Flash)  Flartoolkit marker tracking  Markerless tracking
  • 4. AR Interaction You can see spatially registered AR.. how can you interact with it?
  • 5. AR Interaction Today Mostly simple interaction Mobile  Outdoor (Junaio, Layar, Wikitude, etc) - Viewing information in place, touch virtual tags  Indoor (Invizimals, Qualcomm demos) - Change viewpoint, screen based (touch screen) Web based  Change viewpoint, screen interaction (mouse)
  • 6. History of AR Interaction
  • 7. 1. AR Information Viewing Information is registered to real-world context  Hand held AR displays Interaction  Manipulation of a window into information space  2D/3D virtual viewpoint control Applications  Context-aware information displays Examples NaviCam Rekimoto, et al. 1997  NaviCam, Cameleon, etc
  • 8. Current AR Information Browsers Mobile AR  GPS + compass Many Applications  Layar  Wikitude  Acrossair  PressLite  Yelp  AR Car Finder  …
  • 9. 2. 3D AR Interfaces Virtual objects displayed in 3D physical space and manipulated  HMDs and 6DOF head-tracking  6DOF hand trackers for input Interaction  Viewpoint control  Traditional 3D UI interaction: Kiyokawa, et al. 2000 manipulation, selection, etc. Requires custom input devices
  • 10. VLEGO - AR 3D Interaction
  • 11. 3. Augmented Surfaces andTangible Interfaces Basic principles  Virtual objects are projected on a surface  Physical objects are used as controls for virtual objects  Support for collaboration
  • 12. Augmented Surfaces Rekimoto, et al. 1998  Front projection  Marker-based tracking  Multiple projection surfaces
  • 13. Tangible User Interfaces (Ishii 97) Create digital shadows for physical objects Foreground  graspable UI Background  ambient interfaces
  • 14. Tangible Interface: ARgroove Collaborative Instrument Exploring Physically Based Interaction  Move and track physical record  Map physical actions to Midi output - Translation, rotation - Tilt, shake Limitation  AR output shown on screen  Separation between input and output
  • 15. Lessons from Tangible Interfaces Benefits  Physical objects make us smart (affordances, constraints)  Objects aid collaboration (shared meaning)  Objects increase understanding (cognitive artifacts) Limitations  Difficult to change object properties  Limited display capabilities (project onto surface)  Separation between object and display
  • 16. 4: Tangible AR AR overcomes limitation of TUIs  enhance display possibilities  merge task/display space  provide public and private views TUI + AR = Tangible AR  Apply TUI methods to AR interface design
  • 17. Example Tangible AR Applications Use of natural physical object manipulations to control virtual objects LevelHead (Oliver)  Physical cubes become rooms VOMAR (Kato 2000)  Furniture catalog book: - Turn over the page to see new models  Paddle interaction: - Push, shake, incline, hit, scoop
  • 18. VOMAR Interface
  • 19. Evolution of AR Interaction1. Information Viewing Interfaces  simple (conceptually!), unobtrusive§ 3D AR Interfaces  expressive, creative, require attention§ Tangible Interfaces  Embedded into conventional environments4. Tangible AR  Combines TUI input + AR display
  • 20. Limitations Typical limitations  Simple/No interaction (viewpoint control)  Require custom devices  Single mode interaction  2D input for 3D (screen based interaction)  No understanding of real world  Explicit vs. implicit interaction  Unintelligent interfaces (no learning)
  • 21. Natural Interaction
  • 22. The Vision of AR
  • 23. To Make the Vision Real.. Hardware/software requirements  Contact lens displays  Free space hand/body tracking  Environment recognition  Speech/gesture recognition  Etc..
  • 24. Natural Interaction Automatically detecting real environment  Environmental awareness  Physically based interaction Gesture Input  Free-hand interaction Multimodal Input  Speech and gesture interaction  Implicit rather than Explicit interaction
  • 25. Environmental Awareness
  • 26. AR MicroMachines AR experience with environment awareness and physically-based interaction  Based on MS Kinect RGB-D sensor Augmented environment supports  occlusion, shadows  physically-based interaction between real and virtual objects
  • 27. Operating Environment
  • 28. Architecture Our framework uses five libraries:  OpenNI  OpenCV  OPIRA  Bullet Physics  OpenSceneGraph
  • 29. System Flow The system flow consists of three sections:  Image Processing and Marker Tracking  Physics Simulation  Rendering
  • 30. Physics Simulation Create virtual mesh over real world Update at 10 fps – can move real objects Use by physics engine for collision detection (virtual/real) Use by OpenScenegraph for occlusion and shadows
  • 31. RenderingOcclusion Shadows
  • 32. Natural Gesture Interaction HIT Lab NZ AR Gesture Library
  • 33. Motivation AR MicroMachines and PhobiAR • Treated the environment as static – no tracking • Tracked objects in 2DMore realistic interaction requires 3D gesture tracking
  • 34. Motivation Occlusion IssuesAR MicroMachines only achieved realistic occlusion because the user’s viewpoint matched the Kinect’sProper occlusion requires a more complete model of scene objects
  • 35. HITLabNZ’s Gesture LibraryArchitecture
  • 36. HITLabNZ’s Gesture LibraryArchitecture o Supports PCL, OpenNI, OpenCV, and Kinect SDK. o Provides access to depth, RGB, XYZRGB. o Usage: Capturing color image, depth image and concatenated point clouds from a single or multiple cameras o For example: Kinect for Xbox 360 Kinect for Windows Asus Xtion Pro Live
  • 37. HITLabNZ’s Gesture LibraryArchitecture o Segment images and point clouds based on color, depth and space. o Usage: Segmenting images or point clouds using color models, depth, or spatial properties such as location, shape and size. o For example: Skin color segmentation Depth threshold
  • 38. HITLabNZ’s Gesture LibraryArchitecture o Identify and track objects between frames based on XYZRGB. o Usage: Identifying current position/orientation of the tracked object in space. o For example: Training set of hand poses, colors represent unique regions of the hand. Raw output (without- cleaning) classified on real hand input (depth image).
  • 39. HITLabNZ’s Gesture LibraryArchitecture o Hand Recognition/Modeling  Skeleton based (for low resolution approximation)  Model based (for more accurate representation) o Object Modeling (identification and tracking rigid- body objects) o Physical Modeling (physical interaction)  Sphere Proxy  Model based  Mesh based o Usage: For general spatial interaction in AR/VR environment
  • 40. MethodRepresent models as collections of spheres moving with the models in the Bullet physics engine
  • 41. MethodRender AR scene with OpenSceneGraph, using depth map for occlusion Shadows yet to be implemented
  • 42. Results
  • 43. HITLabNZ’s Gesture LibraryArchitecture o Static (hand pose recognition) o Dynamic (meaningful movement recognition) o Context-based gesture recognition (gestures with context, e.g. pointing) o Usage: Issuing commands/anticipating user intention and high level interaction.
  • 44. Multimodal Interaction
  • 45. Multimodal Interaction Combined speech input Gesture and Speech complimentary  Speech - modal commands, quantities  Gesture - selection, motion, qualities Previous work found multimodal interfaces intuitive for 2D/3D graphics interaction
  • 46. 1. Marker Based Multimodal Interface  Add speech recognition to VOMAR  Paddle + speech commands
  • 47. Commands Recognized Create Command "Make a blue chair": to create a virtual object and place it on the paddle. Duplicate Command "Copy this": to duplicate a virtual object and place it on the paddle. Grab Command "Grab table": to select a virtual object and place it on the paddle. Place Command "Place here": to place the attached object in the workspace. Move Command "Move the couch": to attach a virtual object in the workspace to the paddle so that it follows the paddle movement.
  • 48. System Architecture
  • 49. Object Relationships"Put chair behind the table”Where is behind? View specific regions
  • 50. User Evaluation Performance time  Speech + static paddle significantly faster Gesture-only condition less accurate for position/orientation Users preferred speech + paddle input
  • 51. Subjective Surveys
  • 52. 2. Free Hand Multimodal Input Use free hand to interact with AR content Recognize simple gestures No marker tracking Point Move Pick/Drop
  • 53. Multimodal Architecture
  • 54. Multimodal Fusion
  • 55. Hand Occlusion
  • 56. User Evaluation Change object shape, colour and position Conditions  Speech only, gesture only, multimodal Measure  performance time, error, subjective survey
  • 57. Experimental SetupChange object shape and colour
  • 58. Results Average performance time (MMI, speech fastest)  Gesture: 15.44s  Speech: 12.38s  Multimodal: 11.78s No difference in user errors User subjective survey  Q1: How natural was it to manipulate the object? - MMI, speech significantly better  70% preferred MMI, 25% speech only, 5% gesture only
  • 59. Future Directions
  • 60. Future Research Mobile real world capture Mobile gesture input Intelligent interfaces Virtual characters
  • 61. Natural Gesture Interaction on Mobile Use mobile camera for hand tracking  Fingertip detection
  • 62. Evaluation Gesture input more than twice as slow as touch No difference in naturalness
  • 63. Intelligent Interfaces Most AR systems stupid  Don’t recognize user behaviour  Don’t provide feedback  Don’t adapt to user Especially important for training  Scaffolded learning  Moving beyond check-lists of actions
  • 64. Intelligent Interfaces AR interface + intelligent tutoring system  ASPIRE constraint based system (from UC)  Constraints - relevance cond., satisfaction cond., feedback
  • 65. Domain Ontology
  • 66. Intelligent Feedback Actively monitors user behaviour  Implicit vs. explicit interaction Provides corrective feedback
  • 67. Evaluation Results 16 subjects, with and without ITS Improved task completion Improved learning
  • 68. Intelligent Agents AR characters  Virtual embodiment of system  Multimodal input/output Examples  AR Lego, Welbo, etc  Mr Virtuoso - AR character more real, more fun - On-screen 3D and AR similar in usefulness
  • 69. Conclusions
  • 70. Conclusions AR traditionally involves tangible interaction New technologies support natural interaction  Environment capture  Natural gestures  Multimodal interaction Opportunities for future research  Mobile, intelligent systems, characters
  • 71. More Information• Mark Billinghurst – mark.billinghurst@hitlabnz.org• Website – http://www.hitlabnz.org/