Natural Interfaces for Augmented Reality

Natural Interfaces for
Augmented Reality

Mark Billinghurst
HIT Lab NZ
University of Canterbury

Augmented Reality Definition
 Defining Characteristics [Azuma 97]
 Combines Real and Virtual Images
- Both can be seen at the same time
 Interactive in real-time
- The virtual content can be interacted with
 Registered in 3D
- Virtual objects appear fixed in space

AR Today
 Most widely used AR is mobile or web based
 Mobile AR
 Outdoor AR (GPS + compass)
- Layar (10 million+ users), Junaio, etc
 Indoor AR (image based tracking)
- QCAR, String etc
 Web based (Flash)
 Flartoolkit marker tracking
 Markerless tracking

AR Interaction
 You can see spatially registered AR..
how can you interact with it?

AR Interaction Today
 Mostly simple interaction
 Mobile
 Outdoor (Junaio, Layar, Wikitude, etc)
- Viewing information in place, touch virtual tags
 Indoor (Invizimals, Qualcomm demos)
- Change viewpoint, screen based (touch screen)
 Web based
 Change viewpoint, screen interaction (mouse)

1. AR Information Viewing
 Information is registered to
real-world context
 Hand held AR displays
 Interaction
 Manipulation of a window
into information space
 2D/3D virtual viewpoint control
 Applications
 Context-aware information displays
 Examples NaviCam Rekimoto, et al. 1997
 NaviCam, Cameleon, etc

Current AR Information Browsers
 Mobile AR
 GPS + compass
 Many Applications
 Layar
 Wikitude
 Acrossair
 PressLite
 Yelp
 AR Car Finder
 …

2. 3D AR Interfaces
 Virtual objects displayed in 3D
physical space and manipulated
 HMDs and 6DOF head-tracking
 6DOF hand trackers for input
 Interaction
 Viewpoint control
 Traditional 3D UI interaction:
Kiyokawa, et al. 2000
manipulation, selection, etc.
 Requires custom input devices

3. Augmented Surfaces and
Tangible Interfaces
 Basic principles
 Virtual objects are projected
on a surface
 Physical objects are used as
controls for virtual objects
 Support for collaboration

Augmented Surfaces
 Rekimoto, et al. 1998
 Front projection
 Marker-based tracking
 Multiple projection surfaces

Tangible User Interfaces (Ishii 97)
 Create digital shadows
for physical objects
 Foreground
 graspable UI
 Background
 ambient interfaces

Tangible Interface: ARgroove
 Collaborative Instrument
 Exploring Physically Based Interaction
 Move and track physical record
 Map physical actions to Midi output
- Translation, rotation
- Tilt, shake
 Limitation
 AR output shown on screen
 Separation between input and output

Lessons from Tangible Interfaces
 Benefits
 Physical objects make us smart (affordances, constraints)
 Objects aid collaboration (shared meaning)
 Objects increase understanding (cognitive artifacts)
 Limitations
 Difficult to change object properties
 Limited display capabilities (project onto surface)
 Separation between object and display

4: Tangible AR
 AR overcomes limitation of TUIs
 enhance display possibilities
 merge task/display space
 provide public and private views

 TUI + AR = Tangible AR
 Apply TUI methods to AR interface design

Example Tangible AR Applications
 Use of natural physical object manipulations to
control virtual objects
 LevelHead (Oliver)
 Physical cubes become rooms
 VOMAR (Kato 2000)
 Furniture catalog book:
- Turn over the page to see new models
 Paddle interaction:
- Push, shake, incline, hit, scoop

Evolution of AR Interaction
1. Information Viewing Interfaces
 simple (conceptually!), unobtrusive
§ 3D AR Interfaces
 expressive, creative, require attention
§ Tangible Interfaces
 Embedded into conventional environments
4. Tangible AR
 Combines TUI input + AR display

Limitations
 Typical limitations
 Simple/No interaction (viewpoint control)
 Require custom devices
 Single mode interaction
 2D input for 3D (screen based interaction)
 No understanding of real world
 Explicit vs. implicit interaction
 Unintelligent interfaces (no learning)

To Make the Vision Real..
 Hardware/software requirements
 Contact lens displays
 Free space hand/body tracking
 Environment recognition
 Speech/gesture recognition
 Etc..

Natural Interaction
 Automatically detecting real environment
 Environmental awareness
 Physically based interaction
 Gesture Input
 Free-hand interaction
 Multimodal Input
 Speech and gesture interaction
 Implicit rather than Explicit interaction

AR MicroMachines
 AR experience with environment awareness
and physically-based interaction
 Based on MS Kinect RGB-D sensor
 Augmented environment supports
 occlusion, shadows
 physically-based interaction between real and
virtual objects

Architecture
 Our framework uses five libraries:

 OpenNI
 OpenCV
 OPIRA
 Bullet Physics
 OpenSceneGraph

System Flow
 The system flow consists of three sections:
 Image Processing and Marker Tracking
 Physics Simulation
 Rendering

Physics Simulation

 Create virtual mesh over real world
 Update at 10 fps – can move real objects
 Use by physics engine for collision detection (virtual/real)
 Use by OpenScenegraph for occlusion and shadows

Rendering

Occlusion Shadows

Natural Gesture Interaction

HIT Lab NZ AR Gesture Library

Motivation
AR MicroMachines and PhobiAR

• Treated the environment as
static – no tracking

• Tracked objects in 2D

More realistic interaction requires 3D gesture tracking

Motivation
Occlusion Issues
AR MicroMachines only achieved realistic occlusion because the user’s viewpoint matched the Kinect’s

Proper occlusion requires a more complete model of scene objects

HITLabNZ’s Gesture Library

Architecture


Architecture
o Supports PCL, OpenNI, OpenCV, and Kinect SDK.
o Provides access to depth, RGB, XYZRGB.
o Usage: Capturing color image, depth image and
concatenated point clouds from a single or multiple cameras
o For example:

Kinect for Xbox 360

Kinect for Windows

Asus Xtion Pro Live


Architecture
o Segment images and point clouds based on color, depth and
space.
o Usage: Segmenting images or point clouds using color
models, depth, or spatial properties such as location, shape
and size.
o For example:

Skin color segmentation

Depth threshold


Architecture
o Identify and track objects between frames based on
XYZRGB.
o Usage: Identifying current position/orientation of the
tracked object in space.
o For example:

Training set of hand
poses, colors
represent unique
regions of the hand.

Raw output (without-
cleaning) classified
on real hand input
(depth image).


Architecture
o Hand Recognition/Modeling
 Skeleton based (for low resolution
approximation)
 Model based (for more accurate
representation)
o Object Modeling (identification and tracking rigid-
body objects)
o Physical Modeling (physical interaction)
 Sphere Proxy
 Model based
 Mesh based
o Usage: For general spatial interaction in AR/VR
environment

Method
Represent models as collections of spheres moving with
the models in the Bullet physics engine

Method
Render AR scene with OpenSceneGraph, using depth map
for occlusion

Shadows yet to be implemented


Architecture
o Static (hand pose recognition)
o Dynamic (meaningful movement recognition)
o Context-based gesture recognition (gestures with context,
e.g. pointing)
o Usage: Issuing commands/anticipating user intention and
high level interaction.

Multimodal Interaction
 Combined speech input
 Gesture and Speech complimentary
 Speech
- modal commands, quantities
 Gesture
- selection, motion, qualities
 Previous work found multimodal interfaces
intuitive for 2D/3D graphics interaction

1. Marker Based Multimodal Interface

 Add speech recognition to VOMAR
 Paddle + speech commands

Commands Recognized
 Create Command "Make a blue chair": to create a virtual
object and place it on the paddle.
 Duplicate Command "Copy this": to duplicate a virtual object
and place it on the paddle.
 Grab Command "Grab table": to select a virtual object and
place it on the paddle.
 Place Command "Place here": to place the attached object in
the workspace.
 Move Command "Move the couch": to attach a virtual object
in the workspace to the paddle so that it follows the paddle
movement.

Object Relationships

"Put chair behind the table”
Where is behind?
View specific regions

User Evaluation
 Performance time
 Speech + static paddle significantly faster

 Gesture-only condition less accurate for position/orientation
 Users preferred speech + paddle input

2. Free Hand Multimodal Input
 Use free hand to interact with AR content
 Recognize simple gestures
 No marker tracking

Point Move Pick/Drop

User Evaluation

 Change object shape, colour and position
 Conditions
 Speech only, gesture only, multimodal
 Measure
 performance time, error, subjective survey

Experimental Setup

Change object shape
and colour

Results
 Average performance time (MMI, speech fastest)
 Gesture: 15.44s
 Speech: 12.38s
 Multimodal: 11.78s
 No difference in user errors
 User subjective survey
 Q1: How natural was it to manipulate the object?
- MMI, speech significantly better
 70% preferred MMI, 25% speech only, 5% gesture only

Future Research
 Mobile real world capture
 Mobile gesture input
 Intelligent interfaces
 Virtual characters

Natural Gesture Interaction on Mobile

 Use mobile camera for hand tracking
 Fingertip detection

Evaluation

 Gesture input more than twice as slow as touch
 No difference in naturalness

Intelligent Interfaces
 Most AR systems stupid
 Don’t recognize user behaviour
 Don’t provide feedback
 Don’t adapt to user
 Especially important for training
 Scaffolded learning
 Moving beyond check-lists of actions

Intelligent Interfaces

 AR interface + intelligent tutoring system
 ASPIRE constraint based system (from UC)
 Constraints
- relevance cond., satisfaction cond., feedback

Intelligent Feedback

 Actively monitors user behaviour
 Implicit vs. explicit interaction
 Provides corrective feedback

Evaluation Results
 16 subjects, with and without ITS
 Improved task completion

 Improved learning

Intelligent Agents
 AR characters
 Virtual embodiment of system
 Multimodal input/output
 Examples
 AR Lego, Welbo, etc
 Mr Virtuoso
- AR character more real, more fun
- On-screen 3D and AR similar in usefulness

Conclusions
 AR traditionally involves tangible interaction
 New technologies support natural interaction
 Environment capture
 Natural gestures
 Multimodal interaction
 Opportunities for future research
 Mobile, intelligent systems, characters

More Information
• Mark Billinghurst
– mark.billinghurst@hitlabnz.org
• Website
– http://www.hitlabnz.org/

Natural Interfaces for Augmented Reality

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Natural Interfaces for Augmented Reality

Similar to Natural Interfaces for Augmented Reality (20)

More from Mark Billinghurst

More from Mark Billinghurst (20)

Recently uploaded

Recently uploaded (20)

Natural Interfaces for Augmented Reality

Editor's Notes