Hands and Speech in Space

  • 402 views
Uploaded on

Speech given by Mark Billinghurst at the AWE 2014 conference on how to use multimodal speech and gesture interaction with Augmented Reality applications. Talk given on May 28th, 2014.

Speech given by Mark Billinghurst at the AWE 2014 conference on how to use multimodal speech and gesture interaction with Augmented Reality applications. Talk given on May 28th, 2014.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
402
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
17
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Hands and Speech in Space Mark Billinghurst mark.billinghurst@hitlabnz.org The HIT Lab NZ, University of Canterbury May 28th 2014
  • 2. 2012 – Iron Man 2
  • 3. To Make the Vision Real..   Hardware/software requirements  Contact lens displays  Free space hand/body tracking  Speech/gesture recognition  Etc..   Most importantly  Usability/User Experience
  • 4. Natural Hand Interaction   Using bare hands to interact with AR content   MS Kinect depth sensing   Real time hand tracking   Physics based simulation model
  • 5. Pros and Cons of Gesture Only Input   Gesture-only good for  Direct manipulation,  Selection, Motion  Rapid expressiveness   Limitations  Descriptions (eg Temporal information)  Operation on large numbers of objects  Indirect manipulation, delayed actions
  • 6. Multimodal Interaction   Combined speech and gesture input   Gesture and Speech complimentary   Speech: modal commands, quantities   Gesture: selection, motion, qualities   Previous work found multimodal interfaces intuitive for 2D/3D graphics interaction   However, few multimodal AR interfaces
  • 7. Wizard of Oz Study   What speech and gesture input would people like to use?   Wizard   Perform speech recognition   Command interpretation   Domain   3D object interaction/modelling Lee, M., & Billinghurst, M. (2008, October). A Wizard of Oz study for an AR multimodal interface. In Proceedings of the 10th international conference on Multimodal interfaces (pp. 249-256). ACM.
  • 8. System Architecture
  • 9. System Set Up
  • 10. Key Results   Most commands multimodal   Multimodal (63%), Gesture (34%), Speech (4%)   Most spoken phrases short   74% phrases average 1.25 words long   Sentences (26%) average 3 words   Main gestures deictic (65%), metaphoric (35%)   In multimodal commands gesture issued first   94% time gesture begun before speech
  • 11. Free Hand Multimodal Input   Use free hand to interact with AR content   Recognize simple gestures   Open hand, closed hand, pointing Point Move Pick/Drop Lee, M., Billinghurst, M., Baek, W., Green, R., & Woo, W. (2013). A usability study of multimodal input in an augmented reality environment. Virtual Reality, 17(4), 293-305.
  • 12. Speech Input   MS Speech + MS SAPI (> 90% accuracy)   Single word speech commands
  • 13. Multimodal Architecture
  • 14. Multimodal Fusion
  • 15. Hand Occlusion
  • 16. Experimental Setup Change object shape and colour
  • 17. User Evaluation   Change object shape, colour and position   Conditions   (1) Speech only, (2) gesture only, (3) multimodal   Measures   performance time, errors, subjective survey
  • 18. Results - Performance   Average performance time   Gesture: 15.44s   Speech: 12.38s   Multimodal: 11.78s   Significant difference across conditions (p < 0.01)   Difference between gesture and speech/MMI
  • 19. Subjective Results (Likert 1-7)   User subjective survey   Gesture significantly worse, MMI and Speech same   MMI perceived as most efficient   Preference   70% MMI, 25% speech only, 5% gesture only Gesture Speech MMI Naturalness 4.60 5.60 5.80 Ease of Use 4.00 5.90 6.00 Efficiency 4.45 5.15 6.05 Physical Effort 4.75 3.15 3.85
  • 20. Observations   Significant difference in number of commands   Gesture (6.14), Speech (5.23), MMI (4.93)   MMI Simultaneous vs. Sequential commands   79% sequential, 21% simultaneous   Reaction to system errors   Almost always repeated same command   In MMI rarely changes modalities
  • 21. Lessons Learned   Multimodal interaction significantly better than gesture alone in AR interfaces for 3D tasks   Shorter task time, more efficient   Multimodal input was more natural, easier, and more effective that gesture/speech only   Simultaneous input rarely used   More studies need to be conducted   What gesture/speech patterns? Richer input
  • 22. 3D Gesture Tracking   3 Gear Systems   Kinect/Primesense Sensor   Two hand tracking   http://www.threegear.com
  • 23. Skeleton Interaction + AR   HMD AR View   Viewpoint tracking   Two hand input   Skeleton interaction, occlusion
  • 24. AR Rift Display
  • 25. Conclusions   AR experiences need new interaction methods   Combined speech and gesture more powerful   Complimentary input modalities   Natural user interfaces possible   Free hand gesture, speech, intelligence interfaces   Important research directions for the future   What gesture/speech commands should be used?   Relationship better speech and gesture?
  • 26. More Information •  Mark Billinghurst –  Email: mark.billinghurst@hitlabnz.org –  Twitter: @marknb00 •  Website –  http://www.hitlabnz.org/