SpeeG        A	  Mul&modal	  Speech-­‐	  and	       Gesture-­‐based	  Text	  Input	  Solu&onLode	  Hoste,	  Bruno	  Dumas	...
Text-input for set-top boxesVrije Universiteit Brussel   SpeeG - Lode Hoste   2
Vrije Universiteit Brussel   SpeeG - Lode Hoste   3
Vrije Universiteit Brussel   SpeeG - Lode Hoste   4
Text-input for set-top boxesVrije Universiteit Brussel   SpeeG - Lode Hoste   5
1D Keyboard for Kinect   Chatpad Controller    Virtual Keyboard for Xbox                SwiftKey               8Pen       ...
Virtual keyboardVrije Universiteit Brussel        SpeeG - Lode Hoste   7
Kinect 1D keyboardVrije Universiteit Brussel     SpeeG - Lode Hoste   8
Kinect 1D keyboardVrije Universiteit Brussel    SpeeG - Lode Hoste   9
1D Keyboard for Kinect   Chatpad Controller    Virtual Keyboard for Xbox                SwiftKey               8Pen       ...
1D Keyboard for Kinect   Chatpad Controller    Virtual Keyboard for Xbox                SwiftKey               8Pen       ...
Dasher Continuous input Joystick / Gaze / ... Open vocabulary Allows imprecise navigationVrije Universiteit Brussel   Spee...
DasherVrije Universiteit Brussel   SpeeG - Lode Hoste   13
Goals:                         Used technologies:           Controller-free                           Kinect           Tex...
SpeeGVrije Universiteit Brussel   SpeeG - Lode Hoste   15
Vrije Universiteit Brussel   SpeeG - Lode Hoste   16
SpeeG Architecture                                             5                             User                         ...
Evaluation                Virtual Keyboard                               Kinect Keyboard                                  ...
Evaluation       7 (male) users: 23-31y                                                         “this was easy for us”    ...
Virtual keyboard                             6.3 WPM                10                9                8                7 ...
Kinect Keyboard                             1.83 WPM               3.50               3.00               2.50             ...
Speech-only                             11 WPM         40         35                                                      ...
SpeeG                             5.8 WPM         10         9         8         7                                        ...
SpeeG                         2.6 7.8 WPM         10         9         8         7                                        ...
Mean WPM per sentence             and input device                                    Virtual Keyboard for Xbox          1...
Errors per sentence                                     and input device                              Virtual Keyboard for...
Vrije Universiteit Brussel   SpeeG - Lode Hoste   27
Future work                             Other visualisations                             Smaller gestures                 ...
Vrije Universiteit Brussel   SpeeG - Lode Hoste   29
SpeeG           A	  Mul&modal	  Speech-­‐	  and	       Gesture-­‐	  based	  Text	  Input	  Solu&on   Lode	  Hoste,	  Bruno...
Upcoming SlideShare
Loading in …5
×

SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

1,689 views

Published on

Presentation given at AVI 2012, International Working Conference on Advanced Visual Interfaces, Capri Island, Italy, May 2012

ABSTRACT: We present SpeeG, a multimodal speech- and body gesture-basedtext input system targeting media centres, set-top boxes and game consoles. Our controller-free zoomable user interface combines speech input with a gesture-based real-time correction of the recognised voice input. While the open source CMU Sphinx voice recogniser transforms speech input into written text, Microsoft’s Kinect sensor is used for the hand gesture tracking. A modified version of the zoomable Dasher interface combines the input from Sphinx and the Kinect sensor. In contrast to existing speech error correction solutions with a clear distinction between a detection and correction phase, our innovative SpeeG text input system enables continuous real-time error correction. An evaluation of the SpeeG prototype has revealed that low error rates for a text input speed of about six words per minute can be achieved after a minimal learning phase. Moreover, in a user study SpeeG has been perceived as the fastest of all evaluated user interfaces and therefore represents a promising candidate for future controller-free text input.

Paper: http://vub.academia.edu/BeatSigner/Papers/1484787/SpeeG_A_Multimodal_Speech-_and_Gesture-based_Text_Input_Solution

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,689
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
19
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

SpeeG - A Multimodal Speech- and Gesture-based Text Input Solution

  1. 1. SpeeG A  Mul&modal  Speech-­‐  and   Gesture-­‐based  Text  Input  Solu&onLode  Hoste,  Bruno  Dumas  and  Beat  Signer
  2. 2. Text-input for set-top boxesVrije Universiteit Brussel SpeeG - Lode Hoste 2
  3. 3. Vrije Universiteit Brussel SpeeG - Lode Hoste 3
  4. 4. Vrije Universiteit Brussel SpeeG - Lode Hoste 4
  5. 5. Text-input for set-top boxesVrije Universiteit Brussel SpeeG - Lode Hoste 5
  6. 6. 1D Keyboard for Kinect Chatpad Controller Virtual Keyboard for Xbox SwiftKey 8Pen EdgeWriter Dasher Speech Dasher SpeeGVrije Universiteit Brussel SpeeG - Lode Hoste 6
  7. 7. Virtual keyboardVrije Universiteit Brussel SpeeG - Lode Hoste 7
  8. 8. Kinect 1D keyboardVrije Universiteit Brussel SpeeG - Lode Hoste 8
  9. 9. Kinect 1D keyboardVrije Universiteit Brussel SpeeG - Lode Hoste 9
  10. 10. 1D Keyboard for Kinect Chatpad Controller Virtual Keyboard for Xbox SwiftKey 8Pen EdgeWriter Dasher Speech Dasher SpeeGVrije Universiteit Brussel SpeeG - Lode Hoste 10
  11. 11. 1D Keyboard for Kinect Chatpad Controller Virtual Keyboard for Xbox SwiftKey 8Pen EdgeWriter Dasher Speech Dasher SpeeGVrije Universiteit Brussel SpeeG - Lode Hoste 11
  12. 12. Dasher Continuous input Joystick / Gaze / ... Open vocabulary Allows imprecise navigationVrije Universiteit Brussel SpeeG - Lode Hoste 12
  13. 13. DasherVrije Universiteit Brussel SpeeG - Lode Hoste 13
  14. 14. Goals: Used technologies: Controller-free Kinect Text input CMU Sphinx Without training DasherVrije Universiteit Brussel SpeeG - Lode Hoste 14
  15. 15. SpeeGVrije Universiteit Brussel SpeeG - Lode Hoste 15
  16. 16. Vrije Universiteit Brussel SpeeG - Lode Hoste 16
  17. 17. SpeeG Architecture 5 User GUI (JDasher) 3 1 4 2 Speech Recogniser Hand Tracking (CMU Sphinx 4) (Microsoft Kinect and NITE)Vrije Universiteit Brussel SpeeG - Lode Hoste 17
  18. 18. Evaluation Virtual Keyboard Kinect Keyboard 5 User GUI (JDasher) Speech-only SpeeG 3 1Vrije Universiteit Brussel SpeeG - Lode Hoste 18
  19. 19. Evaluation 7 (male) users: 23-31y “this was easy for us” “he will allow a rare lie” “did you eat yet” 1-3: DARPA’s TIMIT “my watch fell in the water” “the world is a stage” “peek out the window” 4-6: MacKenzie and Soukoreff Performed a quantitative (Words per minute and nr of errors) and qualitative (feedback and preference) evaluationVrije Universiteit Brussel SpeeG - Lode Hoste 19
  20. 20. Virtual keyboard 6.3 WPM 10 9 8 7 User 1 6 User 2 WPM 5 User 3 User 4 4 User 5 User 6 3 User 7 2 1 0 S1 S2 S3 S4 S5 S6 SentenceVrije Universiteit Brussel SpeeG - Lode Hoste 20
  21. 21. Kinect Keyboard 1.83 WPM 3.50 3.00 2.50 User 1 2.00 User 2 WPM User 3 1.50 User 4 User 5 User 6 1.00 *User 7 0.50 0.00 S1 S2 S3 S4 S5 S6 SentenceVrije Universiteit Brussel SpeeG - Lode Hoste 21
  22. 22. Speech-only 11 WPM 40 35 User 1 30 25 User 1 User 2 WPM 20 User 3 User 4 15 User 5 Speech Recognis User 6 (CMU Sphinx 4 10 User 7 5 0 S1 S2 S3 S4 S5 S6 SentenceVrije Universiteit Brussel SpeeG - Lode Hoste 22
  23. 23. SpeeG 5.8 WPM 10 9 8 7 User 2 6 User 1 WPM 5 User 3 User 4 4 User 5 User 6 3 User 7 2 1 0 S1 S2 S3 S4 S5 S6 SentenceVrije Universiteit Brussel SpeeG - Lode Hoste 23
  24. 24. SpeeG 2.6 7.8 WPM 10 9 8 7 User 2 6 User 1 WPM 5 User 3 User 4 4 User 5 User 6 3 User 7 2 1 0 S1 S2 S3 S4 S5 S6 SentenceVrije Universiteit Brussel SpeeG - Lode Hoste 24
  25. 25. Mean WPM per sentence and input device Virtual Keyboard for Xbox 1D Keyboard for Xbox 5 25 Speech-only User SpeeG GUI (JDasher) 3 1 4 2 20 Speech Recogniser Hand Tracking (CMU Sphinx 4) (Microsoft Kinect and NITE) 15 Controller WPM Speech only 10 Kinect only SpeeG 5 0 S1 S2 S3 S4 S5 S6 SentenceVrije Universiteit Brussel SpeeG - Lode Hoste 25
  26. 26. Errors per sentence and input device Virtual Keyboard for Xbox 1D Keyboard for Xbox 5 10 Speech-only User SpeeG GUI (JDasher) 9 1 3 4 2 8 7 Speech Recogniser (CMU Sphinx 4) Hand Tracking (Microsoft Kinect and NITE) Mean number of errors 6 Controller 5 Speech only 4 Kinect only SpeeG 3 2 1 0 S1 S2 S3 S4 S5 S6 SentenceVrije Universiteit Brussel SpeeG - Lode Hoste 26
  27. 27. Vrije Universiteit Brussel SpeeG - Lode Hoste 27
  28. 28. Future work Other visualisations Smaller gestures Dedicated commands (gesture / voice)Vrije Universiteit Brussel SpeeG - Lode Hoste 28
  29. 29. Vrije Universiteit Brussel SpeeG - Lode Hoste 29
  30. 30. SpeeG A  Mul&modal  Speech-­‐  and   Gesture-­‐  based  Text  Input  Solu&on Lode  Hoste,  Bruno  Dumas,  Beat  Signer Kinect Speech - Controller-free text input - Non-native speakers - Real-time correction - Untrained voice recogniser - Dasher, zoomable interface - 6-12 WPM - probabilities - Perceived fastest - alphabetic order - Game-like character - character-level - Novice and expertsVrije Universiteit Brussel Special thanks to Jorn De Baerdenmaeker and Keith Vertaenen SpeeG - Lode Hoste 30

×