Towards a Smart Wearable Tool to Enable
People with SSPI to Communicate by
Sentence Fragments
Gyula Vörös (ELTE)
Anita Ver...
Motivation
• The ability to communicate with others is of paramount importance
for mental well-being.
• SSPI: Severe speec...
Motivation
• The ability to communicate with others is of paramount importance
for mental well-being.
• In AAC, utterances...
4
Augmented Reality in Medicine
Video: https://dl.dropboxusercontent.com/u/48051165/ERmed_CHI2013video.mp4
Tuesday, May 20...
System Architecture: Mobile Web and App Design
XML-RPC
ERmed Proxy
Ermed Bridge
TCP
Speech-based interaction
HMD & Gaze-ba...
Goals of presented work
• To enable broad accessibility and
communication possibilities for people
with SSPI
• Technical C...
Approach
• The most effective way for people to communicate would be
spontaneous language generation.
• Novel utterance ge...
Fourfold solution
(1) Smart glasses with gaze trackers (thereby
extending ) (gaze tracking and
interpretation/graphical sy...
• Eye tracking glasses (ETG)
– Forward-looking camera
– Eye-tracking cameras
• Head mounted display (HMD)
– See-through
• ...
Eye Tracking Glasses
Eye Tracking Glasses by SensoMotoric
Instruments GmbH
Tuesday, May 20, 14
Head Mounted Display
AiRScouter Head Mounted Display by Brother
Industries
Tuesday, May 20, 14
Motion Processing Unit
MPU-9150 Motion Processing Unit by InvenSense
Inc.
Tuesday, May 20, 14
Communication Board
13
Tuesday, May 20, 14
Main functions
• Symbol selection with gaze tracking
– Calibration is crucial and poses a problem
– Adapt or recalibrate?
...
Usage scenario and steps
Tuesday, May 20, 14
Experiment 1
Retina HMD Setup
16
Tuesday, May 20, 14
Symbol selection on HMD
• Idea: the user selects communication
symbols on the HMD, using eye
tracking
• Crucial problems w...
Symbol selection on HMD
Tuesday, May 20, 14
Symbol selection game on HMD
• Goal: select the numbers in correct order
(1234123…)
• Each selection adds a small amount o...
Real communication experiments
• The Brother Retina HMD was too small to show a
full-sized communication board (CB) (altho...
Experiment 2
Projector as HMD Surrogate
Real patient(s)
21
Tuesday, May 20, 14
https://www.youtube.com/watch?v=C4DOmMtgqz0
Participant
• The participant of this example test is a 30 year old
man with c...
Bliss symbols and words (in Hungarian)
on board
Tuesday, May 20, 14
24
tea, two, sugar
(tea, kettő, cukor)
tea, lemon, sugar
(tea, citrom, cukor)
one, glass, soda
(egy, pohár, kóla)
Tuesday,...
Communication Setting
• The participant could move his head
• The head position must be accounted for (MPU)
• Fiducial mar...
Usage and HCI details
• The estimated gaze position was indicated as a
small red circle on the projected board (similar to...
Communication Setup
Tuesday, May 20, 14
Communication experiments
• Goal: communicate with a partner
• Two situations (with different boards)
– Buying food
– Disc...
Experimental results
• Verification
– To verify that communication was successful,
the participant indicated misunderstandi...
Experiment 3
External Symbols (towards
mixed reality setup)
30
Tuesday, May 20, 14
External symbols
• Idea
– Not all symbols are present in the system
– Optical character recognition can help
– (Object rec...
External symbols - Communication Setting
Tuesday, May 20, 14
33
Tuesday, May 20, 14
Results
• WoZ experiment for OCR (due to bad
light conditions)
• Similar “good” results as in experiment
2
34
Tuesday, May...
Technical input processing
and sentence generation
methods
35
Tuesday, May 20, 14
Sentence fragment generation
• A word guessing game
– Good afternoon, how are
– I apologize for being late, I am very
– My...
Sentence fragment generation
• Understanding symbol communication
requires practice
• Symbol communication is non-syntacti...
Language Models
• Estimate the probabilities of n-grams (sequences of
words)
• P(tea with sugar) > P(tea the sugar)
– Use ...
Backoff and Smoothing
• Backoff: use lower order statistics when the n-gram is rare
– Example: stupid backoff
• Smoothing:...
Stupid backoff
Let denote a string of L tokens of a fixed vocabulary

 approximation reflects the Markov assumption that onl...
Language corpora in use
• Google Books n-gram corpus
– Collection of digitized books from Google
– Very large, freely avai...
Language modeling tools
• Google Books n-gram corpus
– Software: BerkeleyLM
• Pauls, A., Klein, D.: Faster and Smaller N-G...
Prefix tree building algorithm
• Representation
– Work with a prefix tree, where each path represents an
n-gram
• Input
– Se...
Prefix tree building algorithm
• Essentially a breadth-first search, with
some constraints
• Start with an empty tree, root ...
Initial state
Tuesday, May 20, 14
Step 1
Tuesday, May 20, 14
Step 2
Tuesday, May 20, 14
Step 3
Tuesday, May 20, 14
Step 4
Tuesday, May 20, 14
Step 5
Tuesday, May 20, 14
Step 6
Tuesday, May 20, 14
Generated text fragments examples
Tuesday, May 20, 14
Conclusion
• A system to enable people with SSPI to
communicate with natural language
sentences
• Demonstrated the feasibi...
Our contribution
• An interaction system to reduce communication barriers for people
with severe speech and physical impai...
Eye Tracking Requirements
• Eye gaze is a compelling interaction
modality but requires user calibration
before interaction...
Possible improvements and outlook
• 3D gaze tracker that estimates the part of 3D space observed by the
user
• OCR, signs ...
57
http://getnarrative.com/
The Narrative Clip is a tiny, automatic camera and app
that gives you a searchable and shareab...
58
Tuesday, May 20, 14
59
https://www.spaceglasses.com
Tuesday, May 20, 14
Thank you for your
attention!
Tuesday, May 20, 14
Potential CPS systems
• Intervention (e.g., collision avoidance);
• Precision (e.g., robotic surgery);
• Operation in dang...
62
N-gram LM
 Probability for a sentence:
 Factorize (chain rule) without loss of generality:
 Limit length of history
...
MPU
• The accelerometer is used to detect orientation and
acceleration. It can also be applied to detect free-
fall. The a...
Upcoming SlideShare
Loading in …5
×

MindCare 2014: Towards a Smart Wearable Tool to Enable People with SSPI to Communicate by Sentence Fragments

675 views

Published on

The most effective way for people to communicate would be spontaneous language generation.
• Novel utterance generation: the ability to say “almost everything,” without a strictly predefined set of possible utterances/generations/ productions.
• We attempt to give people with SSPI the ability to say almost anything. For this reason, we would like to build a general system that produces novel utterances without predefined rules. We chose a data-driven approach in the form of statistical language modeling.
• In some conditions (e.g., cerebral palsy), people suffer from communication and very severe movement disorders at the same time. For them, special peripherals are necessary. Eye tracking provides a promising alternative to people who cannot use their hands.

Published in: Healthcare, Business, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
675
On SlideShare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
6
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

MindCare 2014: Towards a Smart Wearable Tool to Enable People with SSPI to Communicate by Sentence Fragments

  1. 1. Towards a Smart Wearable Tool to Enable People with SSPI to Communicate by Sentence Fragments Gyula Vörös (ELTE) Anita Verő (ELTE) Balázs Pintér (ELTE) Brigitta Miksztay-Réthey (ELTE) Takumi Toyama (DFKI) Andras Lorincz (ELTE) Daniel Sonntag (DFKI) MindCare 2014, May 8, University of Tokyo Tuesday, May 20, 14
  2. 2. Motivation • The ability to communicate with others is of paramount importance for mental well-being. • SSPI: Severe speech and physical impairments / communication disorders – Cerebral palsy, stroke, ALS/motor neurone disease (MND), muscle spacticity etc. • People with SSPI face enormous challenges in daily life activities. • A person who cannot speak may be forced to only communicate directly to his closest relatives, thereby completely relying on them for interacting with the external world. • Understanding people using traditional alternative and augmentative communication (AAC) methods (such as gestures and communication boards) requires training. • These methods restrict possible communication partners to those who are already familiar with AAC. Tuesday, May 20, 14
  3. 3. Motivation • The ability to communicate with others is of paramount importance for mental well-being. • In AAC, utterances consisting of multiple symbols are often telegraphic: they are unlike natural sentences, often missing words to allow for successful and detailed communication. (“Water now!” instead of “I would appreciate a glass of water, immediately.”) – Some systems allow users to produce whole utterances or sentences that consist of multiple words. The main task of the AAC system is to store and retrieve such utterances. – However, using a predefined set of sentences severely restricts the applicability. • Other approaches allow for a generation of utterances from an unordered, incomplete set of words, but they use predefined rules that constrain communication. Tuesday, May 20, 14
  4. 4. 4 Augmented Reality in Medicine Video: https://dl.dropboxusercontent.com/u/48051165/ERmed_CHI2013video.mp4 Tuesday, May 20, 14
  5. 5. System Architecture: Mobile Web and App Design XML-RPC ERmed Proxy Ermed Bridge TCP Speech-based interaction HMD & Gaze-based interaction Pen-based interaction Tuesday, May 20, 14
  6. 6. Goals of presented work • To enable broad accessibility and communication possibilities for people with SSPI • Technical Challenges: – Overcome physical impairments (for user input) – Help non-trained people to understand people with SSPI (towards generation) Tuesday, May 20, 14
  7. 7. Approach • The most effective way for people to communicate would be spontaneous language generation. • Novel utterance generation: the ability to say “almost everything,” without a strictly predefined set of possible utterances/generations/ productions. • We attempt to give people with SSPI the ability to say almost anything. For this reason, we would like to build a general system that produces novel utterances without predefined rules. We chose a data-driven approach in the form of statistical language modeling. • In some conditions (e.g., cerebral palsy), people suffer from communication and very severe movement disorders at the same time. For them, special peripherals are necessary. Eye tracking provides a promising alternative to people who cannot use their hands. 7 Tuesday, May 20, 14
  8. 8. Fourfold solution (1) Smart glasses with gaze trackers (thereby extending ) (gaze tracking and interpretation/graphical symbol selection) (2) Symbol / Utterance selection (3) Language generation (sentence generation, thereby using language models to propose best hypotheses) (4) Text-To-Speech functionality (TTS) 8 Tuesday, May 20, 14
  9. 9. • Eye tracking glasses (ETG) – Forward-looking camera – Eye-tracking cameras • Head mounted display (HMD) – See-through • Motion Processing Unit (MPU) – Capture head gestures that indicate recalibration need Hardware components Tuesday, May 20, 14
  10. 10. Eye Tracking Glasses Eye Tracking Glasses by SensoMotoric Instruments GmbH Tuesday, May 20, 14
  11. 11. Head Mounted Display AiRScouter Head Mounted Display by Brother Industries Tuesday, May 20, 14
  12. 12. Motion Processing Unit MPU-9150 Motion Processing Unit by InvenSense Inc. Tuesday, May 20, 14
  13. 13. Communication Board 13 Tuesday, May 20, 14
  14. 14. Main functions • Symbol selection with gaze tracking – Calibration is crucial and poses a problem – Adapt or recalibrate? • Utterance generation – Based on selected symbols – Uses natural language models Tuesday, May 20, 14
  15. 15. Usage scenario and steps Tuesday, May 20, 14
  16. 16. Experiment 1 Retina HMD Setup 16 Tuesday, May 20, 14
  17. 17. Symbol selection on HMD • Idea: the user selects communication symbols on the HMD, using eye tracking • Crucial problems with calibration – Eye tracking has to be calibrated – Calibration may degrade over time – Adapt or recalibrate? • User might tolerate calibration errors (when their are small and negligible) Tuesday, May 20, 14
  18. 18. Symbol selection on HMD Tuesday, May 20, 14
  19. 19. Symbol selection game on HMD • Goal: select the numbers in correct order (1234123…) • Each selection adds a small amount of error • Participants indicate when error is too significant to be tolerated • 4 participants, 4 experiments each • On average, errors up to 2.7° deviation from actual eye focus are tolerable. Tuesday, May 20, 14
  20. 20. Real communication experiments • The Brother Retina HMD was too small to show a full-sized communication board (CB) (although resolution is quite good) • We use a projector / beamer to display the CB – Simulate a large HMD • Recognition Feedback is provided (to the user) – Estimated gaze position + selection • Speech generation – word for the selected symbol synthesized Tuesday, May 20, 14
  21. 21. Experiment 2 Projector as HMD Surrogate Real patient(s) 21 Tuesday, May 20, 14
  22. 22. https://www.youtube.com/watch?v=C4DOmMtgqz0 Participant • The participant of this example test is a 30 year old man with cerebral palsy. • He usually communicates with a headstick and an alphabetical communication board, or with a PC- based virtual keyboard controlled by head tracking. • Mousense is already an improvement. 22 Tuesday, May 20, 14
  23. 23. Bliss symbols and words (in Hungarian) on board Tuesday, May 20, 14
  24. 24. 24 tea, two, sugar (tea, kettő, cukor) tea, lemon, sugar (tea, citrom, cukor) one, glass, soda (egy, pohár, kóla) Tuesday, May 20, 14
  25. 25. Communication Setting • The participant could move his head • The head position must be accounted for (MPU) • Fiducial markers around the board are recognized by vision-based pattern recognition techniques Tuesday, May 20, 14
  26. 26. Usage and HCI details • The estimated gaze position was indicated as a small red circle on the projected board (similar to the previous test). • A symbol was selected by keeping the red circle on it for two seconds. • The eye tracker sometimes needs recalibration; – the user could initiate recalibration by raising his head up (detected by the MPU.) – Once the recalibration process was triggered, a distinct sound was played, and an arrow indicated where the participant had to look for doing RC. 26 Tuesday, May 20, 14
  27. 27. Communication Setup Tuesday, May 20, 14
  28. 28. Communication experiments • Goal: communicate with a partner • Two situations (with different boards) – Buying food – Discussing an appointment Tuesday, May 20, 14
  29. 29. Experimental results • Verification – To verify that communication was successful, the participant indicated misunderstandings using well-known yes-no gestures, which were quick and reliable. Moreover, a certified expert in AAC was present to indicate apparent communication problems. – 205 symbol selections happened – 23 of them were incorrect • 89% accuracy – The error rate was acceptable • Real communication took place! Tuesday, May 20, 14
  30. 30. Experiment 3 External Symbols (towards mixed reality setup) 30 Tuesday, May 20, 14
  31. 31. External symbols • Idea – Not all symbols are present in the system – Optical character recognition can help – (Object recognition can help) • Example – The user wants to buy a certain type of sandwich – In the store, there are labels with the names of the sandwich types on it Tuesday, May 20, 14
  32. 32. External symbols - Communication Setting Tuesday, May 20, 14
  33. 33. 33 Tuesday, May 20, 14
  34. 34. Results • WoZ experiment for OCR (due to bad light conditions) • Similar “good” results as in experiment 2 34 Tuesday, May 20, 14
  35. 35. Technical input processing and sentence generation methods 35 Tuesday, May 20, 14
  36. 36. Sentence fragment generation • A word guessing game – Good afternoon, how are – I apologize for being late, I am very – My favorite OS is 36 you? sorry! [Linux, Mac OS, Windows XP, Win CE]. • A symbol interpretation game • LM needs to help – To select words with the right sense (disambiguate homonyms: e.g., river bank versus money bank) – To select hypotheses where graphical symbols (words) are ‘in the right place’ – To increase cohesion between words (agreement) Tuesday, May 20, 14
  37. 37. Sentence fragment generation • Understanding symbol communication requires practice • Symbol communication is non-syntactical – Function words are rarely used – Order of symbols may vary • e.g., {lemon, sugar, tea} -> tea with lemon and sugar • Idea – Generate fragments by inserting function words – Rank them based on language models – User should select from top 4 variants Tuesday, May 20, 14
  38. 38. Language Models • Estimate the probabilities of n-grams (sequences of words) • P(tea with sugar) > P(tea the sugar) – Use a corpus (collection of texts) • Sparsity problem – Long n-grams tend to be rare – Smoothing and backoff methods are used to deal with this problem 38 Tuesday, May 20, 14
  39. 39. Backoff and Smoothing • Backoff: use lower order statistics when the n-gram is rare – Example: stupid backoff • Smoothing: interpolate with lower order statistics – Example: Kneser-Ney smoothing Tuesday, May 20, 14
  40. 40. Stupid backoff Let denote a string of L tokens of a fixed vocabulary approximation reflects the Markov assumption that only the most recent n-1 tokens are relevant when predicting the next word. For any substring wi j of w1 L let denote the frequency of occurrence of tha substring in the longer training data. The maximum likelihood probability estimates for the n-grams are given by their relative frequencies Problematic because (de-)nominator can be zero. Appropriate conditions are needed on the r.h.s. Noisy: estimates need smoothing Tuesday, May 20, 14
  41. 41. Language corpora in use • Google Books n-gram corpus – Collection of digitized books from Google – Very large, freely available – Represents written language • OpenSubtitles corpus – Collection of film and TV subtitles – Moderate size, freely available – Represents spoken language Tuesday, May 20, 14
  42. 42. Language modeling tools • Google Books n-gram corpus – Software: BerkeleyLM • Pauls, A., Klein, D.: Faster and Smaller N-Gram Language Models. In: 49th Annual Meeting of the ACL: Human Language Technologies, Vol. 1, pp. 258--267. ACL, Stroudsburg, USA (2011) – Method: Stupid Backoff • Brants, T., Popat, A. C., Xu, P., Och, F. J., Dean, J.: Large language models in machine translation. In: EMNLP 2007, pp. 858--867. • OpenSubtitles corpus – Software: KenLM • Heafield, K.: KenLM: Faster and Smaller Language Model Queries. In: EMNLP 2011 Sixth Workshop on Statistical Machine Translation, pp. 187--197. ACL, Stroudsburg, USA (2011) – Method: Modified Kneser-Ney smoothing • Heafield, K., Pouzyrevsky, I., Clark, J. H., Koehn, P.: Scalable Modified Kneser-Ney Language Model Estimation. In: 51st Annual Meeting of the Association for Computational Linguistics, pp. 690--696. Curran Associates, Inc., New York, USA Tuesday, May 20, 14
  43. 43. Prefix tree building algorithm • Representation – Work with a prefix tree, where each path represents an n-gram • Input – Set of named entities (e.g., tea, sugars) – Set of function words (e.g., you, with, for) – Desired length of sentence fragment (e.g., 3 words) • Parameters – Score threshold (minimal score, e.g., 10-30) – Leaf limit (maximal number of open leafs, e.g., 200) • Output – Tree (or a list) of potential sentence fragments Tuesday, May 20, 14
  44. 44. Prefix tree building algorithm • Essentially a breadth-first search, with some constraints • Start with an empty tree, root node is open • While there is an open node: – Extend all open leafs with all available words, if the resulting n-gram’s “score” is above a certain threshold – Discard every open leaf node that cannot be extended (falls below threshold) Tuesday, May 20, 14
  45. 45. Initial state Tuesday, May 20, 14
  46. 46. Step 1 Tuesday, May 20, 14
  47. 47. Step 2 Tuesday, May 20, 14
  48. 48. Step 3 Tuesday, May 20, 14
  49. 49. Step 4 Tuesday, May 20, 14
  50. 50. Step 5 Tuesday, May 20, 14
  51. 51. Step 6 Tuesday, May 20, 14
  52. 52. Generated text fragments examples Tuesday, May 20, 14
  53. 53. Conclusion • A system to enable people with SSPI to communicate with natural language sentences • Demonstrated the feasibility of the approach Tuesday, May 20, 14
  54. 54. Our contribution • An interaction system to reduce communication barriers for people with severe speech and physical impairments (SSPI) such as cerebral palsy. • The system consists of two main components: (i) the head-mounted human-computer interaction (HCI) part consisting of smart glasses with gaze trackers and text-to-speech functionality (which implement a communication board and the selection tool), and (ii) a natural language processing pipeline in the backend in order to generate complete sentences from the symbols on the board. • We developed the components to provide a smooth interaction between the user and the system thereby including gaze tracking, symbol selection, symbol recognition, and sentence generation. • Our results suggest that such systems can dramatically increase communication efficiency of people with SSPI. Tuesday, May 20, 14
  55. 55. Eye Tracking Requirements • Eye gaze is a compelling interaction modality but requires user calibration before interaction can commence. • State of the art procedures require the user to fixate on a succession of calibration markers, a task that is often experienced as difficult and tedious. 55 Tuesday, May 20, 14
  56. 56. Possible improvements and outlook • 3D gaze tracker that estimates the part of 3D space observed by the user • OCR, signs and object recognition methods to convert “symbols” to the communication board • new HMDs and eye tracker setups (ergonomic) • Advanced NLP to transform series of symbols to whole domain sentences within the context • integrated calibration: e.g., https://www.andreas-bulling.de/ fileadmin/docs/pfeuffer13_uist.pdf • personalisation according to patient record • CPS integration 56 Tuesday, May 20, 14
  57. 57. 57 http://getnarrative.com/ The Narrative Clip is a tiny, automatic camera and app that gives you a searchable and shareable photographic memory. Tuesday, May 20, 14
  58. 58. 58 Tuesday, May 20, 14
  59. 59. 59 https://www.spaceglasses.com Tuesday, May 20, 14
  60. 60. Thank you for your attention! Tuesday, May 20, 14
  61. 61. Potential CPS systems • Intervention (e.g., collision avoidance); • Precision (e.g., robotic surgery); • Operation in dangerous or inaccessible environments (e.g., search and rescue); • Augmented reality, and the augmentation of human capabilities (e.g., healthcare monitoring and decision support for doctors and patients). – We bring together augmented reality and augmentation of human capabilities in the mobile medical context: mobile CPS, in which the physical system with a medical purpose has got inherent mobility. – This is motivated by the rise in popularity of mobile interaction devices such as smartphones and tablets which has increased interest for CPS developers. – As technologies have become small, mobile, and pervasive, the logical next step (beyond the rapid growth of smartphones or – tablet PCs or surrounding computers) is the usage of mobile augmented reality and mobile decision support CPS. • Alternative and augmentative communication – AAC & AR (mixed reality) for the patient – Dementia and SSPI 61http://www.dfki.de/RadSpeech/Kognit Tuesday, May 20, 14
  62. 62. 62 N-gram LM  Probability for a sentence:  Factorize (chain rule) without loss of generality:  Limit length of history – Unigram LM: – Bigram LM: – Trigram LM: Tuesday, May 20, 14
  63. 63. MPU • The accelerometer is used to detect orientation and acceleration. It can also be applied to detect free- fall. The accelerometer returns three values (in units of g), one along the x, y, and z-axes respectively. • The gyroscope is used to detect rotation in space, and returns information in units of degrees per second. The gyroscope also returns three values along the x, y, and z-axes. 63 Tuesday, May 20, 14

×