- To create an interaction volume, the Kinect is positioned above the desired interaction space facing downwards. - A reference marker is placed in the interaction space to calculate the transform between the Kinect coordinate system and the coordinate system used by the AR viewing camera. - Users can also wear color markers on their fingers for pre-defined gesture interaction.
- The OpenSceneGraph framework is used for rendering. The input video image is rendered as the background, with all the virtual objects rendered on top. - At the top level of the scene graph, the viewing transformation is applied such that all virtual objects are transformed so as to appear attached to the real world. - The trimesh is rendered as an array of quads, with an alpha value of zero. This allows realistic occlusion effects of the terrain and virtual objects, while not affecting the users’ view of the real environment. - A custom fragment shader was written to allow rendering of shadows to the invisible terrain.
Appearance-based interaction has been used at the Lab before, both in AR Micromachines and PhobiAR. Flaws in these applications have motivated my work on advanced tracking and modeling. AR Micromachines did not allow for dynamic interaction – a car could be picked up, but because the motion of the hand was not known, friction could not be simulated between the car and the hand. PhobiAR introduced tracking for dynamic interaction, but it really only tracked objects in 2D. I’ll show you what I mean.. As soon as the hand is flipped the tracking fails and the illusion of realistic interaction is broken. 3D tracking was required to make the interaction in both of these applications more realistic
Another issue with typical AR applications is the handling of occlusion. The Kinect allows a model of the environment to be developed, which can help in determining whether a real object is in front of a virtual one. Micromachines had good success by assuming a situation such as that shown on the right, with all objects in the scene in contact with the ground. This was a fair assumption when most of the objects were books etc. However, in PhobiAR the user’s hands were often above the ground, more like the scene on the left. The thing to notice is that these two scenes are indistinguishable from the Kinect’s point of view, but completely different from the observer’s point of view. The main problem is that we don’t know enough about the shape real-world objects to handle occlusion properly. My work aims to model real-world objects by combining views of the objects across multiple frames, allowing better occlusion.
The gesture library will provide a C++ API for real-time recognition and tracking of hands and rigid-body objects in 3D environments. The library will support usage of single and multiple depth sensing cameras. Collision detection and physics simulation will be integrated for realistic physical interaction. Finally, learning algorithms will be implemented for recognizing hand gestures.
The library will support usage of single and multiple depth sensing cameras. Aim for general consumer hardware.
Interaction between real objects and the virtual balls was achieved by representing objects as collections of spheres. The location of the spheres was determined by the modeling stage while their motion was found during tracking. I used the Bullet physics engine for physics simulation.
The AR scene was rendered using OpenSceneGraph. Because the Kinect’s viewpoint was also the user’s viewpoint, realistic occlusion was possible using the Kinect’s depth data. I did not have time to experiment with using the object models to improve occlusion from other viewpoints. Also, the addition of shadows could have significantly improved the realism of the application.
AR Today Most widely used
AR is mobile or web based Mobile AR Outdoor AR (GPS + compass) - Layar (10 million+ users), Junaio, etc Indoor AR (image based tracking) - QCAR, String etc Web based (Flash) Flartoolkit marker tracking Markerless tracking
1. AR Information Viewing Information
is registered to real-world context Hand held AR displays Interaction Manipulation of a window into information space 2D/3D virtual viewpoint control Applications Context-aware information displays Examples NaviCam Rekimoto, et al. 1997 NaviCam, Cameleon, etc
2. 3D AR Interfaces Virtual
objects displayed in 3D physical space and manipulated HMDs and 6DOF head-tracking 6DOF hand trackers for input Interaction Viewpoint control Traditional 3D UI interaction: Kiyokawa, et al. 2000 manipulation, selection, etc. Requires custom input devices
Tangible Interface: ARgroove Collaborative Instrument
Exploring Physically Based Interaction Move and track physical record Map physical actions to Midi output - Translation, rotation - Tilt, shake Limitation AR output shown on screen Separation between input and output
Lessons from Tangible Interfaces Benefits
Physical objects make us smart (affordances, constraints) Objects aid collaboration (shared meaning) Objects increase understanding (cognitive artifacts) Limitations Difficult to change object properties Limited display capabilities (project onto surface) Separation between object and display
4: Tangible AR AR overcomes
limitation of TUIs enhance display possibilities merge task/display space provide public and private views TUI + AR = Tangible AR Apply TUI methods to AR interface design
Example Tangible AR Applications Use
of natural physical object manipulations to control virtual objects LevelHead (Oliver) Physical cubes become rooms VOMAR (Kato 2000) Furniture catalog book: - Turn over the page to see new models Paddle interaction: - Push, shake, incline, hit, scoop
Evolution of AR Interaction1. Information
Viewing Interfaces simple (conceptually!), unobtrusive§ 3D AR Interfaces expressive, creative, require attention§ Tangible Interfaces Embedded into conventional environments4. Tangible AR Combines TUI input + AR display
Limitations Typical limitations Simple/No
interaction (viewpoint control) Require custom devices Single mode interaction 2D input for 3D (screen based interaction) No understanding of real world Explicit vs. implicit interaction Unintelligent interfaces (no learning)
AR MicroMachines AR experience with
environment awareness and physically-based interaction Based on MS Kinect RGB-D sensor Augmented environment supports occlusion, shadows physically-based interaction between real and virtual objects
Physics Simulation Create virtual mesh
over real world Update at 10 fps – can move real objects Use by physics engine for collision detection (virtual/real) Use by OpenScenegraph for occlusion and shadows
HITLabNZ’s Gesture LibraryArchitecture o Supports
PCL, OpenNI, OpenCV, and Kinect SDK. o Provides access to depth, RGB, XYZRGB. o Usage: Capturing color image, depth image and concatenated point clouds from a single or multiple cameras o For example: Kinect for Xbox 360 Kinect for Windows Asus Xtion Pro Live
HITLabNZ’s Gesture LibraryArchitecture o Segment
images and point clouds based on color, depth and space. o Usage: Segmenting images or point clouds using color models, depth, or spatial properties such as location, shape and size. o For example: Skin color segmentation Depth threshold
HITLabNZ’s Gesture LibraryArchitecture o Identify
and track objects between frames based on XYZRGB. o Usage: Identifying current position/orientation of the tracked object in space. o For example: Training set of hand poses, colors represent unique regions of the hand. Raw output (without- cleaning) classified on real hand input (depth image).
HITLabNZ’s Gesture LibraryArchitecture o Hand
Recognition/Modeling Skeleton based (for low resolution approximation) Model based (for more accurate representation) o Object Modeling (identification and tracking rigid- body objects) o Physical Modeling (physical interaction) Sphere Proxy Model based Mesh based o Usage: For general spatial interaction in AR/VR environment
HITLabNZ’s Gesture LibraryArchitecture o Static
(hand pose recognition) o Dynamic (meaningful movement recognition) o Context-based gesture recognition (gestures with context, e.g. pointing) o Usage: Issuing commands/anticipating user intention and high level interaction.
Commands Recognized Create Command "Make
a blue chair": to create a virtual object and place it on the paddle. Duplicate Command "Copy this": to duplicate a virtual object and place it on the paddle. Grab Command "Grab table": to select a virtual object and place it on the paddle. Place Command "Place here": to place the attached object in the workspace. Move Command "Move the couch": to attach a virtual object in the workspace to the paddle so that it follows the paddle movement.
Results Average performance time (MMI,
speech fastest) Gesture: 15.44s Speech: 12.38s Multimodal: 11.78s No difference in user errors User subjective survey Q1: How natural was it to manipulate the object? - MMI, speech significantly better 70% preferred MMI, 25% speech only, 5% gesture only
Intelligent Interfaces Most AR systems
stupid Don’t recognize user behaviour Don’t provide feedback Don’t adapt to user Especially important for training Scaffolded learning Moving beyond check-lists of actions
Intelligent Agents AR characters
Virtual embodiment of system Multimodal input/output Examples AR Lego, Welbo, etc Mr Virtuoso - AR character more real, more fun - On-screen 3D and AR similar in usefulness