Hands and Speech in Space
Mark Billinghurst
mark.billinghurst@hitlabnz.org
The HIT Lab NZ, University of Canterbury
May 28...
2012 – Iron Man 2
To Make the Vision Real..
  Hardware/software requirements
 Contact lens displays
 Free space hand/body tracking
 Spee...
Natural Hand Interaction
  Using bare hands to interact with AR content
  MS Kinect depth sensing
  Real time hand trac...
Pros and Cons of Gesture Only Input
  Gesture-only good for
 Direct manipulation,
 Selection, Motion
 Rapid expressive...
Multimodal Interaction
  Combined speech and gesture input
  Gesture and Speech complimentary
  Speech: modal commands,...
Wizard of Oz Study
  What speech and gesture input
would people like to use?
  Wizard
  Perform speech recognition
  C...
System Architecture
System Set Up
Key Results
  Most commands multimodal
  Multimodal (63%), Gesture (34%), Speech (4%)
  Most spoken phrases short
  74...
Free Hand Multimodal Input
  Use free hand to interact with AR content
  Recognize simple gestures
  Open hand, closed ...
Speech Input
  MS Speech + MS SAPI (> 90% accuracy)
  Single word speech commands
Multimodal Architecture
Multimodal Fusion
Hand Occlusion
Experimental Setup
Change object shape
and colour
User Evaluation
  Change object shape, colour and position
  Conditions
  (1) Speech only, (2) gesture only, (3) multim...
Results - Performance
  Average performance time
  Gesture: 15.44s
  Speech: 12.38s
  Multimodal: 11.78s
  Significan...
Subjective Results (Likert 1-7)
  User subjective survey
  Gesture significantly worse, MMI and Speech same
  MMI perce...
Observations
  Significant difference in number of commands
  Gesture (6.14), Speech (5.23), MMI (4.93)
  MMI Simultane...
Lessons Learned
  Multimodal interaction significantly better than
gesture alone in AR interfaces for 3D tasks
  Shorter...
3D Gesture Tracking
  3 Gear Systems
  Kinect/Primesense Sensor
  Two hand tracking
  http://www.threegear.com
Skeleton Interaction + AR
  HMD AR View
  Viewpoint tracking
  Two hand input
  Skeleton interaction, occlusion
AR Rift Display
Conclusions
  AR experiences need new interaction methods
  Combined speech and gesture more powerful
  Complimentary i...
More Information
•  Mark Billinghurst
–  Email: mark.billinghurst@hitlabnz.org
–  Twitter: @marknb00
•  Website
–  http://...
Hands and Speech in Space
Hands and Speech in Space
Upcoming SlideShare
Loading in …5
×

Hands and Speech in Space

938 views

Published on

Speech given by Mark Billinghurst at the AWE 2014 conference on how to use multimodal speech and gesture interaction with Augmented Reality applications. Talk given on May 28th, 2014.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
938
On SlideShare
0
From Embeds
0
Number of Embeds
10
Actions
Shares
0
Downloads
30
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Hands and Speech in Space

  1. 1. Hands and Speech in Space Mark Billinghurst mark.billinghurst@hitlabnz.org The HIT Lab NZ, University of Canterbury May 28th 2014
  2. 2. 2012 – Iron Man 2
  3. 3. To Make the Vision Real..   Hardware/software requirements  Contact lens displays  Free space hand/body tracking  Speech/gesture recognition  Etc..   Most importantly  Usability/User Experience
  4. 4. Natural Hand Interaction   Using bare hands to interact with AR content   MS Kinect depth sensing   Real time hand tracking   Physics based simulation model
  5. 5. Pros and Cons of Gesture Only Input   Gesture-only good for  Direct manipulation,  Selection, Motion  Rapid expressiveness   Limitations  Descriptions (eg Temporal information)  Operation on large numbers of objects  Indirect manipulation, delayed actions
  6. 6. Multimodal Interaction   Combined speech and gesture input   Gesture and Speech complimentary   Speech: modal commands, quantities   Gesture: selection, motion, qualities   Previous work found multimodal interfaces intuitive for 2D/3D graphics interaction   However, few multimodal AR interfaces
  7. 7. Wizard of Oz Study   What speech and gesture input would people like to use?   Wizard   Perform speech recognition   Command interpretation   Domain   3D object interaction/modelling Lee, M., & Billinghurst, M. (2008, October). A Wizard of Oz study for an AR multimodal interface. In Proceedings of the 10th international conference on Multimodal interfaces (pp. 249-256). ACM.
  8. 8. System Architecture
  9. 9. System Set Up
  10. 10. Key Results   Most commands multimodal   Multimodal (63%), Gesture (34%), Speech (4%)   Most spoken phrases short   74% phrases average 1.25 words long   Sentences (26%) average 3 words   Main gestures deictic (65%), metaphoric (35%)   In multimodal commands gesture issued first   94% time gesture begun before speech
  11. 11. Free Hand Multimodal Input   Use free hand to interact with AR content   Recognize simple gestures   Open hand, closed hand, pointing Point Move Pick/Drop Lee, M., Billinghurst, M., Baek, W., Green, R., & Woo, W. (2013). A usability study of multimodal input in an augmented reality environment. Virtual Reality, 17(4), 293-305.
  12. 12. Speech Input   MS Speech + MS SAPI (> 90% accuracy)   Single word speech commands
  13. 13. Multimodal Architecture
  14. 14. Multimodal Fusion
  15. 15. Hand Occlusion
  16. 16. Experimental Setup Change object shape and colour
  17. 17. User Evaluation   Change object shape, colour and position   Conditions   (1) Speech only, (2) gesture only, (3) multimodal   Measures   performance time, errors, subjective survey
  18. 18. Results - Performance   Average performance time   Gesture: 15.44s   Speech: 12.38s   Multimodal: 11.78s   Significant difference across conditions (p < 0.01)   Difference between gesture and speech/MMI
  19. 19. Subjective Results (Likert 1-7)   User subjective survey   Gesture significantly worse, MMI and Speech same   MMI perceived as most efficient   Preference   70% MMI, 25% speech only, 5% gesture only Gesture Speech MMI Naturalness 4.60 5.60 5.80 Ease of Use 4.00 5.90 6.00 Efficiency 4.45 5.15 6.05 Physical Effort 4.75 3.15 3.85
  20. 20. Observations   Significant difference in number of commands   Gesture (6.14), Speech (5.23), MMI (4.93)   MMI Simultaneous vs. Sequential commands   79% sequential, 21% simultaneous   Reaction to system errors   Almost always repeated same command   In MMI rarely changes modalities
  21. 21. Lessons Learned   Multimodal interaction significantly better than gesture alone in AR interfaces for 3D tasks   Shorter task time, more efficient   Multimodal input was more natural, easier, and more effective that gesture/speech only   Simultaneous input rarely used   More studies need to be conducted   What gesture/speech patterns? Richer input
  22. 22. 3D Gesture Tracking   3 Gear Systems   Kinect/Primesense Sensor   Two hand tracking   http://www.threegear.com
  23. 23. Skeleton Interaction + AR   HMD AR View   Viewpoint tracking   Two hand input   Skeleton interaction, occlusion
  24. 24. AR Rift Display
  25. 25. Conclusions   AR experiences need new interaction methods   Combined speech and gesture more powerful   Complimentary input modalities   Natural user interfaces possible   Free hand gesture, speech, intelligence interfaces   Important research directions for the future   What gesture/speech commands should be used?   Relationship better speech and gesture?
  26. 26. More Information •  Mark Billinghurst –  Email: mark.billinghurst@hitlabnz.org –  Twitter: @marknb00 •  Website –  http://www.hitlabnz.org/

×