The document discusses the evolution and various types of augmented reality (AR) interfaces, highlighting their definitions, characteristics, and interaction methods. It explores mobile, web-based, and tangible interfaces, emphasizing the integration of natural interactions, gesture recognition, and multimodal input. The potential for intelligent interfaces and future research opportunities in AR technology is also presented.
Overview of natural interfaces in augmented reality by Mark Billinghurst from HIT Lab NZ.
Augmented reality (AR) combines real and virtual images interactively in 3D with real-time registration.
Mobile AR dominates, with applications like Layar (10M+ users) and web-based AR using marker tracking.
History and evolution of AR interaction, emphasizing context-aware information browsing and mobile AR applications.
3D displays and tangible user interfaces to manipulate virtual objects with physical controls for collaboration.
Tangible interfaces enhance interaction by using physical controls with AR, such as in LevelHead and VOMAR applications.
Evolution of AR interaction, highlighting information viewing, 3D interfaces, and ongoing limitations of traditional methods.
Vision for AR includes natural interaction through environmental awareness and gesture input for intuitive experiences.
AR MicroMachines utilize environment awareness for realistic physical interactions, focusing on occlusion and physics simulation.
Details of HIT Lab NZ's gesture library architecture for recognizing and tracking gestures within AR environments.
Multimodal interaction combines speech and gestures, leading to faster performance times and preferred user experiences.
Future research areas include mobile gesture input, intelligent systems for AR, and the integration of virtual characters. AR's advances in tangible interaction and natural control methods present new opportunities for development and research.
Natural Interfaces for
Augmented Reality
Mark Billinghurst
HIT Lab NZ
University of Canterbury
3.
Augmented Reality Definition
Defining Characteristics [Azuma 97]
Combines Real and Virtual Images
- Both can be seen at the same time
Interactive in real-time
- The virtual content can be interacted with
Registered in 3D
- Virtual objects appear fixed in space
4.
AR Today
Mostwidely used AR is mobile or web based
Mobile AR
Outdoor AR (GPS + compass)
- Layar (10 million+ users), Junaio, etc
Indoor AR (image based tracking)
- QCAR, String etc
Web based (Flash)
Flartoolkit marker tracking
Markerless tracking
5.
AR Interaction
Youcan see spatially registered AR..
how can you interact with it?
6.
AR Interaction Today
Mostly simple interaction
Mobile
Outdoor (Junaio, Layar, Wikitude, etc)
- Viewing information in place, touch virtual tags
Indoor (Invizimals, Qualcomm demos)
- Change viewpoint, screen based (touch screen)
Web based
Change viewpoint, screen interaction (mouse)
1. AR InformationViewing
Information is registered to
real-world context
Hand held AR displays
Interaction
Manipulation of a window
into information space
2D/3D virtual viewpoint control
Applications
Context-aware information displays
Examples NaviCam Rekimoto, et al. 1997
NaviCam, Cameleon, etc
9.
Current AR InformationBrowsers
Mobile AR
GPS + compass
Many Applications
Layar
Wikitude
Acrossair
PressLite
Yelp
AR Car Finder
…
10.
2. 3D ARInterfaces
Virtual objects displayed in 3D
physical space and manipulated
HMDs and 6DOF head-tracking
6DOF hand trackers for input
Interaction
Viewpoint control
Traditional 3D UI interaction:
Kiyokawa, et al. 2000
manipulation, selection, etc.
Requires custom input devices
3. Augmented Surfacesand
Tangible Interfaces
Basic principles
Virtual objects are projected
on a surface
Physical objects are used as
controls for virtual objects
Support for collaboration
Tangible User Interfaces(Ishii 97)
Create digital shadows
for physical objects
Foreground
graspable UI
Background
ambient interfaces
16.
Tangible Interface: ARgroove
Collaborative Instrument
Exploring Physically Based Interaction
Move and track physical record
Map physical actions to Midi output
- Translation, rotation
- Tilt, shake
Limitation
AR output shown on screen
Separation between input and output
18.
Lessons from TangibleInterfaces
Benefits
Physical objects make us smart (affordances, constraints)
Objects aid collaboration (shared meaning)
Objects increase understanding (cognitive artifacts)
Limitations
Difficult to change object properties
Limited display capabilities (project onto surface)
Separation between object and display
19.
4: Tangible AR
AR overcomes limitation of TUIs
enhance display possibilities
merge task/display space
provide public and private views
TUI + AR = Tangible AR
Apply TUI methods to AR interface design
20.
Example Tangible ARApplications
Use of natural physical object manipulations to
control virtual objects
LevelHead (Oliver)
Physical cubes become rooms
VOMAR (Kato 2000)
Furniture catalog book:
- Turn over the page to see new models
Paddle interaction:
- Push, shake, incline, hit, scoop
Evolution of ARInteraction
1. Information Viewing Interfaces
simple (conceptually!), unobtrusive
§ 3D AR Interfaces
expressive, creative, require attention
§ Tangible Interfaces
Embedded into conventional environments
4. Tangible AR
Combines TUI input + AR display
23.
Limitations
Typical limitations
Simple/No interaction (viewpoint control)
Require custom devices
Single mode interaction
2D input for 3D (screen based interaction)
No understanding of real world
Explicit vs. implicit interaction
Unintelligent interfaces (no learning)
AR MicroMachines
ARexperience with environment awareness
and physically-based interaction
Based on MS Kinect RGB-D sensor
Augmented environment supports
occlusion, shadows
physically-based interaction between real and
virtual objects
System Flow
Thesystem flow consists of three sections:
Image Processing and Marker Tracking
Physics Simulation
Rendering
33.
Physics Simulation
Createvirtual mesh over real world
Update at 10 fps – can move real objects
Use by physics engine for collision detection (virtual/real)
Use by OpenScenegraph for occlusion and shadows
Motivation
AR MicroMachines and PhobiAR
• Treated the environment as
static – no tracking
• Tracked objects in 2D
More realistic interaction requires 3D gesture tracking
37.
Motivation
Occlusion Issues
AR MicroMachines only achieved realistic occlusion because the user’s viewpoint matched the Kinect’s
Proper occlusion requires a more complete model of scene objects
HITLabNZ’s Gesture Library
Architecture
o Supports PCL, OpenNI, OpenCV, and Kinect SDK.
o Provides access to depth, RGB, XYZRGB.
o Usage: Capturing color image, depth image and
concatenated point clouds from a single or multiple cameras
o For example:
Kinect for Xbox 360
Kinect for Windows
Asus Xtion Pro Live
40.
HITLabNZ’s Gesture Library
Architecture
o Segment images and point clouds based on color, depth and
space.
o Usage: Segmenting images or point clouds using color
models, depth, or spatial properties such as location, shape
and size.
o For example:
Skin color segmentation
Depth threshold
41.
HITLabNZ’s Gesture Library
Architecture
o Identify and track objects between frames based on
XYZRGB.
o Usage: Identifying current position/orientation of the
tracked object in space.
o For example:
Training set of hand
poses, colors
represent unique
regions of the hand.
Raw output (without-
cleaning) classified
on real hand input
(depth image).
42.
HITLabNZ’s Gesture Library
Architecture
o Hand Recognition/Modeling
Skeleton based (for low resolution
approximation)
Model based (for more accurate
representation)
o Object Modeling (identification and tracking rigid-
body objects)
o Physical Modeling (physical interaction)
Sphere Proxy
Model based
Mesh based
o Usage: For general spatial interaction in AR/VR
environment
HITLabNZ’s Gesture Library
Architecture
o Static (hand pose recognition)
o Dynamic (meaningful movement recognition)
o Context-based gesture recognition (gestures with context,
e.g. pointing)
o Usage: Issuing commands/anticipating user intention and
high level interaction.
Commands Recognized
CreateCommand "Make a blue chair": to create a virtual
object and place it on the paddle.
Duplicate Command "Copy this": to duplicate a virtual object
and place it on the paddle.
Grab Command "Grab table": to select a virtual object and
place it on the paddle.
Place Command "Place here": to place the attached object in
the workspace.
Move Command "Move the couch": to attach a virtual object
in the workspace to the paddle so that it follows the paddle
movement.
Results
Average performancetime (MMI, speech fastest)
Gesture: 15.44s
Speech: 12.38s
Multimodal: 11.78s
No difference in user errors
User subjective survey
Q1: How natural was it to manipulate the object?
- MMI, speech significantly better
70% preferred MMI, 25% speech only, 5% gesture only
Intelligent Interfaces
MostAR systems stupid
Don’t recognize user behaviour
Don’t provide feedback
Don’t adapt to user
Especially important for training
Scaffolded learning
Moving beyond check-lists of actions
68.
Intelligent Interfaces
ARinterface + intelligent tutoring system
ASPIRE constraint based system (from UC)
Constraints
- relevance cond., satisfaction cond., feedback
Evaluation Results
16subjects, with and without ITS
Improved task completion
Improved learning
73.
Intelligent Agents
ARcharacters
Virtual embodiment of system
Multimodal input/output
Examples
AR Lego, Welbo, etc
Mr Virtuoso
- AR character more real, more fun
- On-screen 3D and AR similar in usefulness
Conclusions
AR traditionallyinvolves tangible interaction
New technologies support natural interaction
Environment capture
Natural gestures
Multimodal interaction
Opportunities for future research
Mobile, intelligent systems, characters
76.
More Information
• MarkBillinghurst
– mark.billinghurst@hitlabnz.org
• Website
– http://www.hitlabnz.org/
Editor's Notes
#31 - To create an interaction volume, the Kinect is positioned above the desired interaction space facing downwards. - A reference marker is placed in the interaction space to calculate the transform between the Kinect coordinate system and the coordinate system used by the AR viewing camera. - Users can also wear color markers on their fingers for pre-defined gesture interaction.
#35 - The OpenSceneGraph framework is used for rendering. The input video image is rendered as the background, with all the virtual objects rendered on top. - At the top level of the scene graph, the viewing transformation is applied such that all virtual objects are transformed so as to appear attached to the real world. - The trimesh is rendered as an array of quads, with an alpha value of zero. This allows realistic occlusion effects of the terrain and virtual objects, while not affecting the users’ view of the real environment. - A custom fragment shader was written to allow rendering of shadows to the invisible terrain.
#37 Appearance-based interaction has been used at the Lab before, both in AR Micromachines and PhobiAR. Flaws in these applications have motivated my work on advanced tracking and modeling. AR Micromachines did not allow for dynamic interaction – a car could be picked up, but because the motion of the hand was not known, friction could not be simulated between the car and the hand. PhobiAR introduced tracking for dynamic interaction, but it really only tracked objects in 2D. I’ll show you what I mean.. As soon as the hand is flipped the tracking fails and the illusion of realistic interaction is broken. 3D tracking was required to make the interaction in both of these applications more realistic
#38 Another issue with typical AR applications is the handling of occlusion. The Kinect allows a model of the environment to be developed, which can help in determining whether a real object is in front of a virtual one. Micromachines had good success by assuming a situation such as that shown on the right, with all objects in the scene in contact with the ground. This was a fair assumption when most of the objects were books etc. However, in PhobiAR the user’s hands were often above the ground, more like the scene on the left. The thing to notice is that these two scenes are indistinguishable from the Kinect’s point of view, but completely different from the observer’s point of view. The main problem is that we don’t know enough about the shape real-world objects to handle occlusion properly. My work aims to model real-world objects by combining views of the objects across multiple frames, allowing better occlusion.
#39 The gesture library will provide a C++ API for real-time recognition and tracking of hands and rigid-body objects in 3D environments. The library will support usage of single and multiple depth sensing cameras. Collision detection and physics simulation will be integrated for realistic physical interaction. Finally, learning algorithms will be implemented for recognizing hand gestures.
#40 The library will support usage of single and multiple depth sensing cameras. Aim for general consumer hardware.
#44 Interaction between real objects and the virtual balls was achieved by representing objects as collections of spheres. The location of the spheres was determined by the modeling stage while their motion was found during tracking. I used the Bullet physics engine for physics simulation.
#45 The AR scene was rendered using OpenSceneGraph. Because the Kinect’s viewpoint was also the user’s viewpoint, realistic occlusion was possible using the Kinect’s depth data. I did not have time to experiment with using the object models to improve occlusion from other viewpoints. Also, the addition of shadows could have significantly improved the realism of the application.