426 Lecture 9: Research Directions in AR
Upcoming SlideShare
Loading in...5

426 Lecture 9: Research Directions in AR



The final lecture in the COSC 426 graduate course in Augmented Reality. Taught by Mark Billinghurst from the HIT Lab NZ at the University of Canterbury on Sept. 19th 2012

The final lecture in the COSC 426 graduate course in Augmented Reality. Taught by Mark Billinghurst from the HIT Lab NZ at the University of Canterbury on Sept. 19th 2012



Total Views
Views on SlideShare
Embed Views



3 Embeds 878

http://development.blog.shinobi.jp 786
http://wei0826.blogspot.tw 90
http://wei0826.blogspot.com 2



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

426 Lecture 9: Research Directions in AR 426 Lecture 9: Research Directions in AR Presentation Transcript

  • COSC 426: Augmented Reality Mark Billinghurst mark.billinghurst@hitlabnz.org Sept 19th 2012 Lecture 9: AR Research Directions
  • Looking to the Future
  • The Future is with usIt takes at least 20 years for new technologies to go from the lab to the lounge..“The technologies that will significantly affect our lives over the next 10 years have been around for a decade.The future is with us.The trick is learning how to spot it.The commercialization of research, in other words, is far more about Oct 11th 2004 prospecting than alchemy.” Bill Buxton
  • Research Directions experiences Usability applications Interaction tools Authoring components Tracking, Display Sony CSL © 2004
  • Research Directions  Components   Markerless tracking, hybrid tracking   Displays, input devices  Tools   Authoring tools, user generated content  Applications   Interaction techniques/metaphors  Experiences   User evaluation, novel AR/MR experiences
  • HMD Design
  • Occlusion with See-through HMD  The Problem   Occluding real objects with virtual   Occluding virtual objects with real Real Scene Current See-through HMD
  • ELMO (Kiyokawa 2001)  Occlusive see-through HMD   Masking LCD   Real time range finding
  • ELMO Demo
  • ELMO Design Virtual images from LCD Depth Sensing LCD MaskRealWorld Optical Combiner   Use LCD mask to block real world   Depth sensing for occluding virtual images
  • ELMO Results
  • Future Displays  Always on, unobtrusive
  • Google Glasses
  • Contact Lens Display  Babak Parviz   University Washington  MEMS components   Transparent elements   Micro-sensors  Challenges   Miniaturization   Assembly   Eye-safe
  • Contact Lens Prototype
  • Applications
  • Interaction Techniques  Input techniques   3D vs. 2D input   Pen/buttons/gestures  Natural Interaction   Speech + gesture input  Intelligent Interfaces   Artificial agents   Context sensing
  • Flexible Displays  Flexible Lens Surface   Bimanual interaction   Digital paper analogy Red Planet, 2000
  • Sony CSL © 2004
  • Sony CSL © 2004
  • Tangible User Interfaces (TUIs)  GUMMI bendable display prototype  Reproduced by permission of Sony CSL
  • Sony CSL © 2004
  • Sony CSL © 2004
  • Lucid Touch  Microsoft Research & Mitsubishi Electric Research Labs  Wigdor, D., Forlines, C., Baudisch, P., Barnwell, J., Shen, C. LucidTouch: A See-Through Mobile Device In Proceedings of UIST 2007, Newport, Rhode Island, October 7-10, 2007, pp. 269–278.
  • Auditory Modalities  Auditory   auditory icons   earcons   speech synthesis/recognition   Nomadic Radio (Sawhney) -  combines spatialized audio -  auditory cues -  speech synthesis/recognition
  • Gestural interfaces  1. Micro-gestures   (unistroke, smartPad)  2. Device-based gestures   (tilt based examples)  3. Embodied interaction   (eye toy)
  • Natural Gesture Interaction on Mobile  Use mobile camera for hand tracking   Fingertip detection
  • Evaluation  Gesture input more than twice as slow as touch  No difference in naturalness
  • Haptic Modalities   Haptic interfaces   Simple uses in mobiles? (vibration instead of ringtone)   Sony’s Touchengine -  physiological experiments show you can perceive two stimulus 5ms apart, and spaced as low as 0.2 microns 4 µm n層 28 µm n層 V
  • Haptic Input  AR Haptic Workbench   CSIRO 2003 – Adcock et. al.
  • AR Haptic Interface  Phantom, ARToolKit, Magellan
  • Natural Interaction
  • The Vision of AR
  • To Make the Vision Real..  Hardware/software requirements   Contact lens displays   Free space hand/body tracking   Environment recognition   Speech/gesture recognition   Etc..
  • Natural Interaction  Automatically detecting real environment   Environmental awareness   Physically based interaction  Gesture Input   Free-hand interaction  Multimodal Input   Speech and gesture interaction   Implicit rather than Explicit interaction
  • Environmental Awareness
  • AR MicroMachines  AR experience with environment awareness and physically-based interaction   Based on MS Kinect RGB-D sensor  Augmented environment supports   occlusion, shadows   physically-based interaction between real and virtual objects
  • Operating Environment
  • Architecture  Our framework uses five libraries:   OpenNI   OpenCV   OPIRA   Bullet Physics   OpenSceneGraph
  • System Flow  The system flow consists of three sections:   Image Processing and Marker Tracking   Physics Simulation   Rendering
  • Physics Simulation  Create virtual mesh over real world  Update at 10 fps – can move real objects  Use by physics engine for collision detection (virtual/real)  Use by OpenScenegraph for occlusion and shadows
  • RenderingOcclusion Shadows
  • Natural Gesture Interaction
  • Mo#va#on   AR  MicroMachines  and  PhobiAR   •     Treated  the  environment  as            sta/c  –  no  tracking   •     Tracked  objects  in  2D  More  realis#c  interac#on  requires  3D  gesture  tracking      
  • Mo#va#on   Occlusion  Issues  AR  MicroMachines  only  achieved  realis/c  occlusion  because  the  user’s  viewpoint  matched  the  Kinect’s  Proper  occlusion  requires  a  more  complete  model  of  scene  objects  
  • HITLabNZ’s Gesture Library Architecture 5. Gesture•  Static Gestures•  Dynamic Gestures•  Context based Gestures 4. Modeling•  Hand recognition/modeling•  Rigid-body modeling 3. Classification/Tracking 2. Segmentation 1. Hardware Interface
  • HITLabNZ’s Gesture Library Architecture 5. Gesture o  Supports PCL, OpenNI, OpenCV, and Kinect SDK. o  Provides access to depth, RGB, XYZRGB.•  Static Gestures o  Usage: Capturing color image, depth image and concatenated•  Dynamic Gestures point clouds from a single or multiple cameras o  For example:•  Context based Gestures 4. Modeling•  Hand recognition/ modeling Kinect for Xbox 360•  Rigid-body modeling 3. Classification/Tracking Kinect for Windows 2. Segmentation Asus Xtion Pro Live 1. Hardware Interface
  • HITLabNZ’s Gesture Library Architecture 5. Gesture o  Segment images and point clouds based on color, depth and space.•  Static Gestures o  Usage: Segmenting images or point clouds using color•  Dynamic Gestures models, depth, or spatial properties such as location, shape and size.•  Context based Gestures o  For example: 4. Modeling•  Hand recognition/ modeling•  Rigid-body modeling Skin color segmentation 3. Classification/Tracking 2. Segmentation Depth threshold 1. Hardware Interface
  • HITLabNZ’s Gesture Library Architecture 5. Gesture o  Identify and track objects between frames based on XYZRGB.•  Static Gestures o  Usage: Identifying current position/orientation of the tracked•  Dynamic Gestures object in space.•  Context based Gestures o  For example: 4. Modeling•  Hand recognition/ Training set of hand modeling poses, colors•  Rigid-body modeling represent unique regions of the hand. 3. Classification/Tracking 2. Segmentation Raw output (without- cleaning) classified on real hand input 1. Hardware Interface (depth image).
  • HITLabNZ’s Gesture Library Architecture 5. Gesture o  Hand Recognition/Modeling   Skeleton based (for low resolution•  Static Gestures approximation)•  Dynamic Gestures   Model based (for more accurate•  Context based Gestures representation) o  Object Modeling (identification and tracking rigid- 4. Modeling body objects)•  Hand recognition/ o  Physical Modeling (physical interaction) modeling   Sphere Proxy•  Rigid-body modeling   Model based   Mesh based 3. Classification/Tracking o  Usage: For general spatial interaction in AR/VR environment 2. Segmentation 1. Hardware Interface
  • Method  Represent  models  as  collec#ons  of  spheres  moving  with  the   models  in  the  Bullet  physics  engine  
  • Method  Render  AR  scene  with  OpenSceneGraph,  using  depth  map   for  occlusion   Shadows  yet  to  be  implemented  
  • Results
  • HITLabNZ’s Gesture Library Architecture 5. Gesture o  Static (hand pose recognition) o  Dynamic (meaningful movement recognition)•  Static Gestures o  Context-based gesture recognition (gestures with context,•  Dynamic Gestures e.g. pointing) o  Usage: Issuing commands/anticipating user intention and high•  Context based Gestures level interaction. 4. Modeling•  Hand recognition/ modeling•  Rigid-body modeling 3. Classification/Tracking 2. Segmentation 1. Hardware Interface
  • Multimodal Interaction
  • Multimodal Interaction  Combined speech input  Gesture and Speech complimentary   Speech -  modal commands, quantities   Gesture -  selection, motion, qualities  Previous work found multimodal interfaces intuitive for 2D/3D graphics interaction
  • 1. Marker Based Multimodal Interface  Add speech recognition to VOMAR  Paddle + speech commands
  • Commands Recognized  Create Command "Make a blue chair": to create a virtual object and place it on the paddle.  Duplicate Command "Copy this": to duplicate a virtual object and place it on the paddle.  Grab Command "Grab table": to select a virtual object and place it on the paddle.  Place Command "Place here": to place the attached object in the workspace.  Move Command "Move the couch": to attach a virtual object in the workspace to the paddle so that it follows the paddle movement.
  • System Architecture
  • Object Relationships"Put chair behind the table”Where is behind? View specific regions
  • User Evaluation  Performance time   Speech + static paddle significantly faster  Gesture-only condition less accurate for position/orientation  Users preferred speech + paddle input
  • Subjective Surveys
  • 2. Free Hand Multimodal Input  Use free hand to interact with AR content  Recognize simple gestures  No marker tracking Point Move Pick/Drop
  • Multimodal Architecture
  • Multimodal Fusion
  • Hand Occlusion
  • User Evaluation  Change object shape, colour and position  Conditions   Speech only, gesture only, multimodal  Measure   performance time, error, subjective survey
  • Experimental SetupChange object shape and colour
  • Results  Average performance time (MMI, speech fastest)   Gesture: 15.44s   Speech: 12.38s   Multimodal: 11.78s  No difference in user errors  User subjective survey   Q1: How natural was it to manipulate the object? -  MMI, speech significantly better   70% preferred MMI, 25% speech only, 5% gesture only
  • Intelligent Interfaces
  • Intelligent Interfaces  Most AR systems stupid   Don’t recognize user behaviour   Don’t provide feedback   Don’t adapt to user  Especially important for training   Scaffolded learning   Moving beyond check-lists of actions
  • Intelligent Interfaces  AR interface + intelligent tutoring system   ASPIRE constraint based system (from UC)   Constraints -  relevance cond., satisfaction cond., feedback
  • Domain Ontology
  • Intelligent Feedback  Actively monitors user behaviour   Implicit vs. explicit interaction  Provides corrective feedback
  • Evaluation Results  16 subjects, with and without ITS  Improved task completion  Improved learning
  • Intelligent Agents  AR characters   Virtual embodiment of system   Multimodal input/output  Examples   AR Lego, Welbo, etc   Mr Virtuoso -  AR character more real, more fun -  On-screen 3D and AR similar in usefulness
  • Context Sensing
  • Context Sensing  TKK Project  Using context to manage information  Context from   Speech   Gaze   Real world  AR Display
  • Gaze Interaction
  • AR View
  • More Information Over Time
  • Experiences
  • Novel Experiences  Crossing Boundaries   Ubiquitous VR/AR  Collaborative Experiences  Massive AR   AR + Social Networking  Usability
  • Crossing Boundaries Jun Rekimoto, Sony CSL
  • Invisible Interfaces Jun Rekimoto, Sony CSL
  • Milgram’s Reality-Virtuality continuum Mixed Reality Real Augmented Augmented VirtualEnvironment Reality (AR) Virtuality (AV) Environment Reality - Virtuality (RV) Continuum
  • The MagicBookReality Augmented Augmented Virtuality Reality (AR) Virtuality (AV)
  • Invisible Interfaces Jun Rekimoto, Sony CSL
  • Example: Visualizing Sensor Networks  Rauhala et. al. 2007 (Linkoping)  Network of Humidity Sensors   ZigBee wireless communication  Use Mobile AR to Visualize Humidity
  • Invisible Interfaces Jun Rekimoto, Sony CSL
  • UbiVR – CAMAR CAMAR Controller CAMAR Viewer CAMAR CompanionGIST - Korea
  • ubiHome @ GIST Media services Light service MR window ubiTrack Where/When Tag-it ubiKey ©ubiHome Who/What/What/When/How When/How PDA Couch Sensor Door Sensor Who/What/When/How When/How When/How
  • CAMAR - GIST (CAMAR: Context-Aware Mobile Augmented Reality)
  •   UCAM: Architecture wear-UCAM Content Sensor Service (Integrator,Manager, Interpreter,ServiceProvider) Context Interface Network Interface ubi-UCAM BAN/PAN TCP/IP (BT) (Discovery,Control,Event) Operating System vr-UCAM
  • Hybrid User InerfacesGoal: To incorporate AR into normal meeting environment  Physical Components   Real props  Display Elements   2D and 3D (AR) displays  Interaction Metaphor   Use multiple tools – each relevant for the task
  • Hybrid User Interfaces 1 2 3 4 PERSONAL TABLETOP WHITEBOARD MULTIGROUPPrivate Display Private Display Private Display Private Display Group Display Public Display Group Display Public Display
  • Ubiquitous UbiComp Ubi AR Ubi VRWeiser Mobile AR Desktop AR VR Terminal Reality Virtual Reality Milgram From: Joe Newman
  • MassiveMulti User Ubiquitous r Weise TerminalSingle User Reality Milg ram VR
  • Remote Collaboration
  • AR Client  HMD and HHD   Showing virtual images over real world   Images drawn by remote expert   Local interaction
  • Shared Visual Context (Fussell ,1999)  Remote video collaboration   Shared manual, video viewing   Compared Video, Audio, Side-by-side collaboration   Communication analysis
  • WACL(Kurata,2004)  Wearable Camera/Laser Pointer   Independent pointer control   Remote panorama view
  • WACL(Kurata,2004)  Remote Expert View   Panorama viewing, annotation, image capture
  • As If Being There (Poelman, 2012)  AR + Scene Capture   HMD viewing, remote expert   Gesture input   Scene capture (PTAM), stereo camera
  • As If Being There (Poelman, 2012)  Gesture Interaction   Hand postures recognized   Menu superimposed on hands
  • Real World Capture  Using Kinect for 3D Scene Capture   Camera tracking   AR overlay   Remote situational awareness
  • Remote scene capture with AR annotations added
  • Future Directions SLIDE 116 Massive Multiuser  Handheld AR for the first time allows extremely high numbers of AR users  Requires   New types of applications/games   New infrastructure (server/client/peer-to-peer)   Content distribution…
  • Massive MultiUser  2D Applications   MSN – 29 million   Skype – 10 million   Facebook – 100m+  3D/VR Applications   SecondLife > 50K   Stereo projection - <500  Augmented Reality   Shared Space (1999) - 4   Invisible Train (2004) - 8
  • Augmented Reality 2.0 Infrastructure
  • Leveraging Web 2.0  Content retrieval using HTTP  XML encoded meta information   KML placemarks + extensions  Queries   Based on location (from GPS, image recognition)   Based on situation (barcode markers)  Queries also deliver tracking feature databases  Everybody can set up an AR 2.0 server  Syndication:   Community servers for end-user content   Tagging  AR client subscribes to arbitrary number of feeds
  • Content  Content creation and delivery   Content creation pipeline   Delivering previously unknown content  Streaming of   Data (objects, multi-media)   Applications  Distribution   How do users learn about all that content?   How do they access it?
  • ARML (AR Markup Language)
  • Scaling Up  AR on a City Scale  Using mobile phone as ubiquitous sensor  MIT Senseable City Lab   http://senseable.mit.edu/
  • WikiCity Rome (Senseable City Lab MIT)
  • Conclusions
  • AR Research in the HIT Lab NZ  Gesture interaction   Gesture library  Multimodal interaction   Collaborative speech/gesture interfaces  Mobile AR interfaces   Outdoor AR, interaction methods, navigation tools  AR authoring tools   Visual programming for AR  Remote Collaboration   Mobile AR for remote interaction
  • More Information•  Mark Billinghurst –  mark.billinghurst@hitlabnz.org•  Websites –  http://www.hitlabnz.org/ –  http://artoolkit.sourceforge.net/ –  http://www.osgart.org/ –  http://www.hitlabnz.org/wiki/buildAR/