SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

  • 1,188 views
Uploaded on

Presentation given at AVI 2012, International Working Conference on Advanced Visual Interfaces, Capri Island, Italy, May 2012 …

Presentation given at AVI 2012, International Working Conference on Advanced Visual Interfaces, Capri Island, Italy, May 2012

ABSTRACT: We present SpeeG, a multimodal speech- and body gesture-basedtext input system targeting media centres, set-top boxes and game consoles. Our controller-free zoomable user interface combines speech input with a gesture-based real-time correction of the recognised voice input. While the open source CMU Sphinx voice recogniser transforms speech input into written text, Microsoft’s Kinect sensor is used for the hand gesture tracking. A modified version of the zoomable Dasher interface combines the input from Sphinx and the Kinect sensor. In contrast to existing speech error correction solutions with a clear distinction between a detection and correction phase, our innovative SpeeG text input system enables continuous real-time error correction. An evaluation of the SpeeG prototype has revealed that low error rates for a text input speed of about six words per minute can be achieved after a minimal learning phase. Moreover, in a user study SpeeG has been perceived as the fastest of all evaluated user interfaces and therefore represents a promising candidate for future controller-free text input.

Paper: http://vub.academia.edu/BeatSigner/Papers/1484787/SpeeG_A_Multimodal_Speech-_and_Gesture-based_Text_Input_Solution

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,188
On Slideshare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
17
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. SpeeG A  Mul&modal  Speech-­‐  and   Gesture-­‐based  Text  Input  Solu&onLode  Hoste,  Bruno  Dumas  and  Beat  Signer
  • 2. Text-input for set-top boxesVrije Universiteit Brussel SpeeG - Lode Hoste 2
  • 3. Vrije Universiteit Brussel SpeeG - Lode Hoste 3
  • 4. Vrije Universiteit Brussel SpeeG - Lode Hoste 4
  • 5. Text-input for set-top boxesVrije Universiteit Brussel SpeeG - Lode Hoste 5
  • 6. 1D Keyboard for Kinect Chatpad Controller Virtual Keyboard for Xbox SwiftKey 8Pen EdgeWriter Dasher Speech Dasher SpeeGVrije Universiteit Brussel SpeeG - Lode Hoste 6
  • 7. Virtual keyboardVrije Universiteit Brussel SpeeG - Lode Hoste 7
  • 8. Kinect 1D keyboardVrije Universiteit Brussel SpeeG - Lode Hoste 8
  • 9. Kinect 1D keyboardVrije Universiteit Brussel SpeeG - Lode Hoste 9
  • 10. 1D Keyboard for Kinect Chatpad Controller Virtual Keyboard for Xbox SwiftKey 8Pen EdgeWriter Dasher Speech Dasher SpeeGVrije Universiteit Brussel SpeeG - Lode Hoste 10
  • 11. 1D Keyboard for Kinect Chatpad Controller Virtual Keyboard for Xbox SwiftKey 8Pen EdgeWriter Dasher Speech Dasher SpeeGVrije Universiteit Brussel SpeeG - Lode Hoste 11
  • 12. Dasher Continuous input Joystick / Gaze / ... Open vocabulary Allows imprecise navigationVrije Universiteit Brussel SpeeG - Lode Hoste 12
  • 13. DasherVrije Universiteit Brussel SpeeG - Lode Hoste 13
  • 14. Goals: Used technologies: Controller-free Kinect Text input CMU Sphinx Without training DasherVrije Universiteit Brussel SpeeG - Lode Hoste 14
  • 15. SpeeGVrije Universiteit Brussel SpeeG - Lode Hoste 15
  • 16. Vrije Universiteit Brussel SpeeG - Lode Hoste 16
  • 17. SpeeG Architecture 5 User GUI (JDasher) 3 1 4 2 Speech Recogniser Hand Tracking (CMU Sphinx 4) (Microsoft Kinect and NITE)Vrije Universiteit Brussel SpeeG - Lode Hoste 17
  • 18. Evaluation Virtual Keyboard Kinect Keyboard 5 User GUI (JDasher) Speech-only SpeeG 3 1Vrije Universiteit Brussel SpeeG - Lode Hoste 18
  • 19. Evaluation 7 (male) users: 23-31y “this was easy for us” “he will allow a rare lie” “did you eat yet” 1-3: DARPA’s TIMIT “my watch fell in the water” “the world is a stage” “peek out the window” 4-6: MacKenzie and Soukoreff Performed a quantitative (Words per minute and nr of errors) and qualitative (feedback and preference) evaluationVrije Universiteit Brussel SpeeG - Lode Hoste 19
  • 20. Virtual keyboard 6.3 WPM 10 9 8 7 User 1 6 User 2 WPM 5 User 3 User 4 4 User 5 User 6 3 User 7 2 1 0 S1 S2 S3 S4 S5 S6 SentenceVrije Universiteit Brussel SpeeG - Lode Hoste 20
  • 21. Kinect Keyboard 1.83 WPM 3.50 3.00 2.50 User 1 2.00 User 2 WPM User 3 1.50 User 4 User 5 User 6 1.00 *User 7 0.50 0.00 S1 S2 S3 S4 S5 S6 SentenceVrije Universiteit Brussel SpeeG - Lode Hoste 21
  • 22. Speech-only 11 WPM 40 35 User 1 30 25 User 1 User 2 WPM 20 User 3 User 4 15 User 5 Speech Recognis User 6 (CMU Sphinx 4 10 User 7 5 0 S1 S2 S3 S4 S5 S6 SentenceVrije Universiteit Brussel SpeeG - Lode Hoste 22
  • 23. SpeeG 5.8 WPM 10 9 8 7 User 2 6 User 1 WPM 5 User 3 User 4 4 User 5 User 6 3 User 7 2 1 0 S1 S2 S3 S4 S5 S6 SentenceVrije Universiteit Brussel SpeeG - Lode Hoste 23
  • 24. SpeeG 2.6 7.8 WPM 10 9 8 7 User 2 6 User 1 WPM 5 User 3 User 4 4 User 5 User 6 3 User 7 2 1 0 S1 S2 S3 S4 S5 S6 SentenceVrije Universiteit Brussel SpeeG - Lode Hoste 24
  • 25. Mean WPM per sentence and input device Virtual Keyboard for Xbox 1D Keyboard for Xbox 5 25 Speech-only User SpeeG GUI (JDasher) 3 1 4 2 20 Speech Recogniser Hand Tracking (CMU Sphinx 4) (Microsoft Kinect and NITE) 15 Controller WPM Speech only 10 Kinect only SpeeG 5 0 S1 S2 S3 S4 S5 S6 SentenceVrije Universiteit Brussel SpeeG - Lode Hoste 25
  • 26. Errors per sentence and input device Virtual Keyboard for Xbox 1D Keyboard for Xbox 5 10 Speech-only User SpeeG GUI (JDasher) 9 1 3 4 2 8 7 Speech Recogniser (CMU Sphinx 4) Hand Tracking (Microsoft Kinect and NITE) Mean number of errors 6 Controller 5 Speech only 4 Kinect only SpeeG 3 2 1 0 S1 S2 S3 S4 S5 S6 SentenceVrije Universiteit Brussel SpeeG - Lode Hoste 26
  • 27. Vrije Universiteit Brussel SpeeG - Lode Hoste 27
  • 28. Future work Other visualisations Smaller gestures Dedicated commands (gesture / voice)Vrije Universiteit Brussel SpeeG - Lode Hoste 28
  • 29. Vrije Universiteit Brussel SpeeG - Lode Hoste 29
  • 30. SpeeG A  Mul&modal  Speech-­‐  and   Gesture-­‐  based  Text  Input  Solu&on Lode  Hoste,  Bruno  Dumas,  Beat  Signer Kinect Speech - Controller-free text input - Non-native speakers - Real-time correction - Untrained voice recogniser - Dasher, zoomable interface - 6-12 WPM - probabilities - Perceived fastest - alphabetic order - Game-like character - character-level - Novice and expertsVrije Universiteit Brussel Special thanks to Jorn De Baerdenmaeker and Keith Vertaenen SpeeG - Lode Hoste 30