Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Assessing Virtual Assistant Capabilities with Italian Dysarthric Speech


Published on

Presentation of "Assessing Virtual Assistant Capabilities with Italian Dysarthric Speech" (ASSETS 2018)

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Assessing Virtual Assistant Capabilities with Italian Dysarthric Speech

  1. 1. Fabio Ballati, Fulvio Corno, Luigi De Russis Politecnico di Torino, Italy Assessing Virtual Assistant Capabilities with Italian Dysarthric Speech ASSETS 2018 - October 22-24, 2018 - Galway
  2. 2. 2 Usage of smartphone-based virtual assistants is growing, worldwide Such assistants generally have a positive impact on device accessibility People with speech impairments like dysarthria may be unable to use those virtual assistants with proficiency Background and Motivation
  3. 3. 3 We focused on ALS-inducted dysarthria and the Italian language Propose a methodology for the collection of dysarthric speech samples to evaluate smartphone-based virtual assistants Investigate which assistant provides the most coherent answer when the recognized speech is at least partially correct Investigate whether and how people with moderate dysarthria could be understood by three virtual assistants • Siri, Google Assistant, Cortana Goal
  4. 4. 4 We played the collected speech samples to assess (i) the accuracy in transcription and (ii) the coherence of the answers ASSESSMENT To collect dysarthric speech samples, we designed a specific methodology and we recorded the 34 sentences from 8 people with ALS DATA COLLECTION Selection of 34 suitable sentences for virtual assistants SENTENCES SELECTION Work Phases
  5. 5. 5 Sample sentences (translated in English) Do I need to take an umbrella, today? How many proteins are in two eggs? Add onion and tomatoes to my shopping list Who is the president of the Italian republic? Set the home temperature to 22 degrees. Set an alarm at 8am. … • Goal: to have a set of sentences to record, suitable for smartphone-based virtual assistants • We extracted 34 sentences from the recommended questions for virtual assistants • We, then, slightly modified them to include all the phonemes of the Italian language Sentence Selection SENTENCE SELECTION
  6. 6. 6 Goal: to have a dataset of dysarthric speech samples that may allow us to assess the behavior of virtual assistants Participants • 8 native Italian speakers with ALS-induced dysarthria (4M, 4F), aged 64- 83 • Three types of dysarthria and within two speech intelligibility categories • Flaccid, Spastic, or Unilateral Upper Motor Neuron (Duffy classification) • "Intelligible with repeating" and "Detectable speech disturbance" (ALS Functional Rating Scale) Data Collection DATA COLLECTION
  7. 7. 7 • Simple process, to be easily reproduced • The participant read each of the 34 sentences from an A4 sheet of paper (one sheet per sentence), located in front of the reader, while we recorded them • The recordings were taken with a smartphone located at distance of 30- 40 centimeters from the participant Procedure DATA COLLECTION
  8. 8. 8 Goal: To investigate the accuracy in transcription and the coherence of the answers of the virtual assistants • The assessment took place in a quiet room of our university • The recorded speech sample were played on a laptop connected to an external high-quality speaker • Each of the 272 sentences was played for Siri, Google Assistant, and Cortana, separately, on three different smartphones • iPhone 7 (iOS 11.2), Samsung A5 (Android 8.1), and Lumia 910 (Windows 10 Mobile) • The results of the operation (recognized request and related response) were noted down Assessment ASSESSMENT
  9. 9. 9 Qualitative QC Classification of each provided transcription in: • Correct • Same semantic meaning • Incomplete • Wrong • Not recognized Quantitative QC Word Error Rate (WER) WER = (S + I + D) / N, where S = substitution, I = insertion, D = deletion, and N = number of words in the original sentence Given by the similarity between the original sentence and the provided transcription Measures: Question Comprehension (QC) ASSESSMENT
  10. 10. 10 • An indicator of the appropriateness of the assistants' responses • Computed for sentences that were correct or with the same semantic meaning, only • Given as the number and percentage of times that a virtual assistant provided a certain type of answer: • Coherent answers, i.e., correct or logically consistent responses • Incoherent answers, i.e., logically incoherent responses • Default answers, i.e., responses that an assistant provides by default when it is not able to fully understand or extract any context Measures: Consistency in Answers ASSESSMENT
  11. 11. 11 • WER was highly dependent upon the participant • The average WER for Google Assistant was lower than Cortana • Siri performed the worst • Looking at the results of individual participants, the same trend appeared Results: Quantitative QC ASSESSMENT
  12. 12. 12 Correct Same semantic meaning Incomplete Wrong Not recognized Google Assistant 135 (49.63%) 39 (14.33%) 39 (14.33%) 58 (21.32%) 1 (0.37%) Cortana 85 (31.25%) 23 (8.45%) 20 (7.35%) 141 (51.83%) 3 (1.10%) Siri 36 (13.23%) 7 (2.58%) 32 (11.76%) 149 (54.78%) 48 (17.65%) Overall results are similar to Quantitative QC, with Google Assistant that performed better than the other two Results: Qualitative QC ASSESSMENT
  13. 13. 13 Coherent answer Default answer Incorrect answer Google Assistant (174) 94 (54.02%) 78 (44.83%) 2 (1.15%) Cortana (108) 26 (24.07%) 82 (75.93%) 0 (0%) Siri (43) 26 (60.47%) 13 (30.23%) 4 (9.30%) The answers provided by Google Assistant and Siri were mostly coherent Results: Consistency in Answers ASSESSMENT
  14. 14. 14 We plan to publicly release the collected dataset Google Assistant was the best in recognizing dysarthric speech and in providing suitable answers • Each virtual assistant behave differently • The accuracy of transcription is strictly related to the speaker • Some participants can use Google Assistant without any problems • Siri performed the worst for the accuracy of the transcriptions but provided a good number of suitable answers, when it properly understood the request Key Takeaways
  15. 15. Luigi De Russis Assessing Virtual Assistant Capabilities with Italian Dysarthric Speech