TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
Natural Interfaces for Augmented Reality
1. Natural Interfaces for
Augmented Reality
Mark Billinghurst
HIT Lab NZ
University of Canterbury
2.
3. Augmented Reality Definition
Defining Characteristics [Azuma 97]
Combines Real and Virtual Images
- Both can be seen at the same time
Interactive in real-time
- The virtual content can be interacted with
Registered in 3D
- Virtual objects appear fixed in space
4. AR Today
Most widely used AR is mobile or web based
Mobile AR
Outdoor AR (GPS + compass)
- Layar (10 million+ users), Junaio, etc
Indoor AR (image based tracking)
- QCAR, String etc
Web based (Flash)
Flartoolkit marker tracking
Markerless tracking
5. AR Interaction
You can see spatially registered AR..
how can you interact with it?
6. AR Interaction Today
Mostly simple interaction
Mobile
Outdoor (Junaio, Layar, Wikitude, etc)
- Viewing information in place, touch virtual tags
Indoor (Invizimals, Qualcomm demos)
- Change viewpoint, screen based (touch screen)
Web based
Change viewpoint, screen interaction (mouse)
8. 1. AR Information Viewing
Information is registered to
real-world context
Hand held AR displays
Interaction
Manipulation of a window
into information space
2D/3D virtual viewpoint control
Applications
Context-aware information displays
Examples NaviCam Rekimoto, et al. 1997
NaviCam, Cameleon, etc
9. Current AR Information Browsers
Mobile AR
GPS + compass
Many Applications
Layar
Wikitude
Acrossair
PressLite
Yelp
AR Car Finder
…
10. 2. 3D AR Interfaces
Virtual objects displayed in 3D
physical space and manipulated
HMDs and 6DOF head-tracking
6DOF hand trackers for input
Interaction
Viewpoint control
Traditional 3D UI interaction:
Kiyokawa, et al. 2000
manipulation, selection, etc.
Requires custom input devices
12. 3. Augmented Surfaces and
Tangible Interfaces
Basic principles
Virtual objects are projected
on a surface
Physical objects are used as
controls for virtual objects
Support for collaboration
15. Tangible User Interfaces (Ishii 97)
Create digital shadows
for physical objects
Foreground
graspable UI
Background
ambient interfaces
16. Tangible Interface: ARgroove
Collaborative Instrument
Exploring Physically Based Interaction
Move and track physical record
Map physical actions to Midi output
- Translation, rotation
- Tilt, shake
Limitation
AR output shown on screen
Separation between input and output
17.
18. Lessons from Tangible Interfaces
Benefits
Physical objects make us smart (affordances, constraints)
Objects aid collaboration (shared meaning)
Objects increase understanding (cognitive artifacts)
Limitations
Difficult to change object properties
Limited display capabilities (project onto surface)
Separation between object and display
19. 4: Tangible AR
AR overcomes limitation of TUIs
enhance display possibilities
merge task/display space
provide public and private views
TUI + AR = Tangible AR
Apply TUI methods to AR interface design
20. Example Tangible AR Applications
Use of natural physical object manipulations to
control virtual objects
LevelHead (Oliver)
Physical cubes become rooms
VOMAR (Kato 2000)
Furniture catalog book:
- Turn over the page to see new models
Paddle interaction:
- Push, shake, incline, hit, scoop
22. Evolution of AR Interaction
1. Information Viewing Interfaces
simple (conceptually!), unobtrusive
§ 3D AR Interfaces
expressive, creative, require attention
§ Tangible Interfaces
Embedded into conventional environments
4. Tangible AR
Combines TUI input + AR display
23. Limitations
Typical limitations
Simple/No interaction (viewpoint control)
Require custom devices
Single mode interaction
2D input for 3D (screen based interaction)
No understanding of real world
Explicit vs. implicit interaction
Unintelligent interfaces (no learning)
29. AR MicroMachines
AR experience with environment awareness
and physically-based interaction
Based on MS Kinect RGB-D sensor
Augmented environment supports
occlusion, shadows
physically-based interaction between real and
virtual objects
32. System Flow
The system flow consists of three sections:
Image Processing and Marker Tracking
Physics Simulation
Rendering
33. Physics Simulation
Create virtual mesh over real world
Update at 10 fps – can move real objects
Use by physics engine for collision detection (virtual/real)
Use by OpenScenegraph for occlusion and shadows
36. Motivation
AR MicroMachines and PhobiAR
• Treated the environment as
static – no tracking
• Tracked objects in 2D
More realistic interaction requires 3D gesture tracking
37. Motivation
Occlusion Issues
AR MicroMachines only achieved realistic occlusion because the user’s viewpoint matched the Kinect’s
Proper occlusion requires a more complete model of scene objects
39. HITLabNZ’s Gesture Library
Architecture
o Supports PCL, OpenNI, OpenCV, and Kinect SDK.
o Provides access to depth, RGB, XYZRGB.
o Usage: Capturing color image, depth image and
concatenated point clouds from a single or multiple cameras
o For example:
Kinect for Xbox 360
Kinect for Windows
Asus Xtion Pro Live
40. HITLabNZ’s Gesture Library
Architecture
o Segment images and point clouds based on color, depth and
space.
o Usage: Segmenting images or point clouds using color
models, depth, or spatial properties such as location, shape
and size.
o For example:
Skin color segmentation
Depth threshold
41. HITLabNZ’s Gesture Library
Architecture
o Identify and track objects between frames based on
XYZRGB.
o Usage: Identifying current position/orientation of the
tracked object in space.
o For example:
Training set of hand
poses, colors
represent unique
regions of the hand.
Raw output (without-
cleaning) classified
on real hand input
(depth image).
42. HITLabNZ’s Gesture Library
Architecture
o Hand Recognition/Modeling
Skeleton based (for low resolution
approximation)
Model based (for more accurate
representation)
o Object Modeling (identification and tracking rigid-
body objects)
o Physical Modeling (physical interaction)
Sphere Proxy
Model based
Mesh based
o Usage: For general spatial interaction in AR/VR
environment
46. HITLabNZ’s Gesture Library
Architecture
o Static (hand pose recognition)
o Dynamic (meaningful movement recognition)
o Context-based gesture recognition (gestures with context,
e.g. pointing)
o Usage: Issuing commands/anticipating user intention and
high level interaction.
48. Multimodal Interaction
Combined speech input
Gesture and Speech complimentary
Speech
- modal commands, quantities
Gesture
- selection, motion, qualities
Previous work found multimodal interfaces
intuitive for 2D/3D graphics interaction
49. 1. Marker Based Multimodal Interface
Add speech recognition to VOMAR
Paddle + speech commands
50.
51. Commands Recognized
Create Command "Make a blue chair": to create a virtual
object and place it on the paddle.
Duplicate Command "Copy this": to duplicate a virtual object
and place it on the paddle.
Grab Command "Grab table": to select a virtual object and
place it on the paddle.
Place Command "Place here": to place the attached object in
the workspace.
Move Command "Move the couch": to attach a virtual object
in the workspace to the paddle so that it follows the paddle
movement.
62. Results
Average performance time (MMI, speech fastest)
Gesture: 15.44s
Speech: 12.38s
Multimodal: 11.78s
No difference in user errors
User subjective survey
Q1: How natural was it to manipulate the object?
- MMI, speech significantly better
70% preferred MMI, 25% speech only, 5% gesture only
67. Intelligent Interfaces
Most AR systems stupid
Don’t recognize user behaviour
Don’t provide feedback
Don’t adapt to user
Especially important for training
Scaffolded learning
Moving beyond check-lists of actions
68. Intelligent Interfaces
AR interface + intelligent tutoring system
ASPIRE constraint based system (from UC)
Constraints
- relevance cond., satisfaction cond., feedback
72. Evaluation Results
16 subjects, with and without ITS
Improved task completion
Improved learning
73. Intelligent Agents
AR characters
Virtual embodiment of system
Multimodal input/output
Examples
AR Lego, Welbo, etc
Mr Virtuoso
- AR character more real, more fun
- On-screen 3D and AR similar in usefulness
75. Conclusions
AR traditionally involves tangible interaction
New technologies support natural interaction
Environment capture
Natural gestures
Multimodal interaction
Opportunities for future research
Mobile, intelligent systems, characters
76. More Information
• Mark Billinghurst
– mark.billinghurst@hitlabnz.org
• Website
– http://www.hitlabnz.org/
Editor's Notes
- To create an interaction volume, the Kinect is positioned above the desired interaction space facing downwards. - A reference marker is placed in the interaction space to calculate the transform between the Kinect coordinate system and the coordinate system used by the AR viewing camera. - Users can also wear color markers on their fingers for pre-defined gesture interaction.
- The OpenSceneGraph framework is used for rendering. The input video image is rendered as the background, with all the virtual objects rendered on top. - At the top level of the scene graph, the viewing transformation is applied such that all virtual objects are transformed so as to appear attached to the real world. - The trimesh is rendered as an array of quads, with an alpha value of zero. This allows realistic occlusion effects of the terrain and virtual objects, while not affecting the users’ view of the real environment. - A custom fragment shader was written to allow rendering of shadows to the invisible terrain.
Appearance-based interaction has been used at the Lab before, both in AR Micromachines and PhobiAR. Flaws in these applications have motivated my work on advanced tracking and modeling. AR Micromachines did not allow for dynamic interaction – a car could be picked up, but because the motion of the hand was not known, friction could not be simulated between the car and the hand. PhobiAR introduced tracking for dynamic interaction, but it really only tracked objects in 2D. I’ll show you what I mean.. As soon as the hand is flipped the tracking fails and the illusion of realistic interaction is broken. 3D tracking was required to make the interaction in both of these applications more realistic
Another issue with typical AR applications is the handling of occlusion. The Kinect allows a model of the environment to be developed, which can help in determining whether a real object is in front of a virtual one. Micromachines had good success by assuming a situation such as that shown on the right, with all objects in the scene in contact with the ground. This was a fair assumption when most of the objects were books etc. However, in PhobiAR the user’s hands were often above the ground, more like the scene on the left. The thing to notice is that these two scenes are indistinguishable from the Kinect’s point of view, but completely different from the observer’s point of view. The main problem is that we don’t know enough about the shape real-world objects to handle occlusion properly. My work aims to model real-world objects by combining views of the objects across multiple frames, allowing better occlusion.
The gesture library will provide a C++ API for real-time recognition and tracking of hands and rigid-body objects in 3D environments. The library will support usage of single and multiple depth sensing cameras. Collision detection and physics simulation will be integrated for realistic physical interaction. Finally, learning algorithms will be implemented for recognizing hand gestures.
The library will support usage of single and multiple depth sensing cameras. Aim for general consumer hardware.
Interaction between real objects and the virtual balls was achieved by representing objects as collections of spheres. The location of the spheres was determined by the modeling stage while their motion was found during tracking. I used the Bullet physics engine for physics simulation.
The AR scene was rendered using OpenSceneGraph. Because the Kinect’s viewpoint was also the user’s viewpoint, realistic occlusion was possible using the Kinect’s depth data. I did not have time to experiment with using the object models to improve occlusion from other viewpoints. Also, the addition of shadows could have significantly improved the realism of the application.