SlideShare a Scribd company logo
1 of 21
Speech data augmentation for improving phoneme
transcriptions of aphasic speech using wav2vec 2.0
for the PSST Challenge
Birger Moëll, Jim O’Regan, Shivam Mehta, Ambika Kirkland, Harm Lameris,
Joakim Gustafson, Jonas Beskow
Experiments
2022-05-30 2
• Data augmentation:
– Augmenting the original
– Matching out-of-domain speech (TIMIT)
• (Phonetic) language models
• Voice conversion
• Synthetic voices
Data augmentations
2022-05-30 3
• Approach
• We propose data augmentations and training on non-
aphasiatic datasets to increase robustness and
accuracy of phoneme transcription models for aphasia.
• Steps:
• Data augmentations using pitch shift, gaussian
noise, time stretch, voice conversion and room
impulse response
• Joint training on non aphasiatic dataset (Common
Voice, Timit) and aphasiatic dataset (PSST)
Datasets
2022-05-30 4
• Augmenting with datasets with
manual transcriptions (TIMIT) was
successful
• Augmenting with automatically
phonetized dataset (common voice)
was unsuccessful.
• Acoustically aligning the TIMIT data to
PSST using Room Impulse
Response (RIR) improved
performance
Key findings
● PSST Dataset
○ Boston Naming Test - Short
Form (BNT-SF) and the Verb
Naming Test (VNT)
● TIMIT Dataset
○ Acoustic-Phonetic Speech
Corpus
● Common voice
○ Volunteer-based open-
source dataset for ML
Models
2022-05-30 5
We compared base and large
wav2vec2 models
• The bigger model performed better
• Training with the base model was still
useful for faster experimentations
Voice conversion
2022-05-30 6
VOICE CONVERSION WAS USED TO
AUGMENT DATA BY NEURAL VOICE
CLONING
EXPERIMENTS WITH VOICE CLONING
PROVED UNSUCCESSFUL, LIKELY
BECAUSE OF PSST DATA QUALITY
ISSUES
Pitch shift
2022-05-30 7
PITCH SHIFT WAS THE MOST
SUCCESSFUL PSST AUGMENTATION
IMPROVING RESULTS COMPARED TO
BASELINE
THE PITCH WAS BOTH LOWERED AND
RAISED RANDOMLY
Gaussian Noise
2022-05-30 8
GAUSSIAN NOISE WAS
USEFUL AS A
REGULARIZATION
TECHNIQUE
A PURELY ADDITIVE
TECHNIQUE FOR DATA
AUGMENTATION
Time stretch
2022-05-30 9
TIME STRETCH IS USED TO
REGULARISE THE DURATION
OF THE SPEECH
LIMITED IMPROVEMENT OVER
THE BASELINE
Room impulse response
2022-05-30 10
TO MIMIC THE REVERBERANT
CONDITIONS OF THE PSST
DATA
MOST SUCCESSFUL
AUGMENTATION OF THE TIMIT
DATA
Augmentations on augmentations
2022-05-30 11
We experimented adding
Pitch Shift + Time Stretch +
Gaussian Noise + Room
Impulse Response to
TIMIT.
Augmentations on augmentations
2022-05-30 12
We experimented adding
Pitch Shift + Time Stretch +
Gaussian Noise + Room
Impulse Response to
TIMIT.
Because that’s how we roll.
Language models
2022-05-30 13
WE EXPERIMENTED WITH
LANGUAGE MODELS
THEY LED TO LITTLE OR NO
IMPROVEMENTS OVER
BASELINE
Results
2022-05-30 14
Result
2022-05-30 15
• Data augmentations improve model performance.
• Increasing the size of the model decreases FER and PER.
• Manually-transcribed speech from non-aphasic speakers (TIMIT) improves
performance
– when Room Impulse Response is used to augment the data.
• The best performing model combines aphasic and non-aphasic data
– 21.0% PER
– 9.2% FER
– relative improvement of 9.8%
• Data augmentation, larger model size, and additional non-aphasic data
sources can be helpful
Thank you!
2022-05-30 16
Challenge Results
2022-05-30 17
We came second! 😊
Challenge Results
2022-05-30 18
We came second! 😊
Of two teams 😭
Challenge Results
2022-05-30 19
We came second! 😊
Of two teams 😭
The other team was Baidu 🙄
Challenge Results
2022-05-30 20
Challenge Results
2022-05-30 21
Gale, R. C., Fleegle, M., Fergadiotis, G., & Bedrick, S. (2022, June). The Post-
Stroke Speech Transcription (PSST) Challenge. Proceedings of The RaPID4
Workshop - Resources and ProcessIng of Linguistic, Para-Linguistic and Extra-
Linguistic Data from People with Various Forms of
Cognitive/Psychiatric/Developmental Impairments - within the 13th Language
Resources and Evaluation Conference, 41–55. Retrieved from
https://aclanthology.org/2022.rapid4-1.6

More Related Content

Similar to Seminar: PSST challenge LREC

Improving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyImproving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyIconic Translation Machines
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Sheeyam Shellvacumar
 
Transformer-based SE.pptx
Transformer-based SE.pptxTransformer-based SE.pptx
Transformer-based SE.pptxssuser849b73
 
Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...karthik annam
 
Saito2017icassp
Saito2017icasspSaito2017icassp
Saito2017icasspYuki Saito
 
A REGRESSION APPROACH TO SPEECH ENHANCEMENT BASED ON DEEP NEURAL NETWORKS
A REGRESSION APPROACH TO SPEECH ENHANCEMENT BASED ON DEEP NEURAL NETWORKSA REGRESSION APPROACH TO SPEECH ENHANCEMENT BASED ON DEEP NEURAL NETWORKS
A REGRESSION APPROACH TO SPEECH ENHANCEMENT BASED ON DEEP NEURAL NETWORKSI3E Technologies
 
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...CSCJournals
 
Fast AutoAugment
Fast AutoAugmentFast AutoAugment
Fast AutoAugmentYongsu Baek
 

Similar to Seminar: PSST challenge LREC (9)

Improving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyImproving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case Study
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.
 
Transformer-based SE.pptx
Transformer-based SE.pptxTransformer-based SE.pptx
Transformer-based SE.pptx
 
Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...
 
gautam_resume
gautam_resumegautam_resume
gautam_resume
 
Saito2017icassp
Saito2017icasspSaito2017icassp
Saito2017icassp
 
A REGRESSION APPROACH TO SPEECH ENHANCEMENT BASED ON DEEP NEURAL NETWORKS
A REGRESSION APPROACH TO SPEECH ENHANCEMENT BASED ON DEEP NEURAL NETWORKSA REGRESSION APPROACH TO SPEECH ENHANCEMENT BASED ON DEEP NEURAL NETWORKS
A REGRESSION APPROACH TO SPEECH ENHANCEMENT BASED ON DEEP NEURAL NETWORKS
 
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...
Teager Energy Operation on Wavelet Packet Coefficients for Enhancing Noisy Sp...
 
Fast AutoAugment
Fast AutoAugmentFast AutoAugment
Fast AutoAugment
 

More from Jim O'Regan

How to use a toilet brush
How to use a toilet brushHow to use a toilet brush
How to use a toilet brushJim O'Regan
 
Speech recognition for Riksdag open data
Speech recognition for Riksdag open dataSpeech recognition for Riksdag open data
Speech recognition for Riksdag open dataJim O'Regan
 
Continued Fine-tuning as Single Speaker Adaptation
Continued Fine-tuning as Single Speaker AdaptationContinued Fine-tuning as Single Speaker Adaptation
Continued Fine-tuning as Single Speaker AdaptationJim O'Regan
 
Seminar: Language Variation in Parliamentary Speeches_ First Steps Towards Ro...
Seminar: Language Variation in Parliamentary Speeches_ First Steps Towards Ro...Seminar: Language Variation in Parliamentary Speeches_ First Steps Towards Ro...
Seminar: Language Variation in Parliamentary Speeches_ First Steps Towards Ro...Jim O'Regan
 
30% seminar "kappa"
30% seminar "kappa"30% seminar "kappa"
30% seminar "kappa"Jim O'Regan
 
Language Variation in Parliamentary Speeches: First Steps Towards Robust Phon...
Language Variation in Parliamentary Speeches: First Steps Towards Robust Phon...Language Variation in Parliamentary Speeches: First Steps Towards Robust Phon...
Language Variation in Parliamentary Speeches: First Steps Towards Robust Phon...Jim O'Regan
 
Shallow-transfer rule-based machine translation from Czech to Polish
Shallow-transfer rule-based machine translation from Czech to PolishShallow-transfer rule-based machine translation from Czech to Polish
Shallow-transfer rule-based machine translation from Czech to PolishJim O'Regan
 
MT and Translator's Tools
MT and Translator's ToolsMT and Translator's Tools
MT and Translator's ToolsJim O'Regan
 

More from Jim O'Regan (8)

How to use a toilet brush
How to use a toilet brushHow to use a toilet brush
How to use a toilet brush
 
Speech recognition for Riksdag open data
Speech recognition for Riksdag open dataSpeech recognition for Riksdag open data
Speech recognition for Riksdag open data
 
Continued Fine-tuning as Single Speaker Adaptation
Continued Fine-tuning as Single Speaker AdaptationContinued Fine-tuning as Single Speaker Adaptation
Continued Fine-tuning as Single Speaker Adaptation
 
Seminar: Language Variation in Parliamentary Speeches_ First Steps Towards Ro...
Seminar: Language Variation in Parliamentary Speeches_ First Steps Towards Ro...Seminar: Language Variation in Parliamentary Speeches_ First Steps Towards Ro...
Seminar: Language Variation in Parliamentary Speeches_ First Steps Towards Ro...
 
30% seminar "kappa"
30% seminar "kappa"30% seminar "kappa"
30% seminar "kappa"
 
Language Variation in Parliamentary Speeches: First Steps Towards Robust Phon...
Language Variation in Parliamentary Speeches: First Steps Towards Robust Phon...Language Variation in Parliamentary Speeches: First Steps Towards Robust Phon...
Language Variation in Parliamentary Speeches: First Steps Towards Robust Phon...
 
Shallow-transfer rule-based machine translation from Czech to Polish
Shallow-transfer rule-based machine translation from Czech to PolishShallow-transfer rule-based machine translation from Czech to Polish
Shallow-transfer rule-based machine translation from Czech to Polish
 
MT and Translator's Tools
MT and Translator's ToolsMT and Translator's Tools
MT and Translator's Tools
 

Recently uploaded

Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMKumar Satyam
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsLeah Henrickson
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxjbellis
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTopCSSGallery
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfdanishmna97
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctBrainSell Technologies
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistandanishmna97
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfSrushith Repakula
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch TuesdayIvanti
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Skynet Technologies
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxFIDO Alliance
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdfMuhammad Subhan
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingScyllaDB
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewDianaGray10
 

Recently uploaded (20)

Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Vector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptxVector Search @ sw2con for slideshare.pptx
Vector Search @ sw2con for slideshare.pptx
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
ERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage IntacctERP Contender Series: Acumatica vs. Sage Intacct
ERP Contender Series: Acumatica vs. Sage Intacct
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistan
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
2024 May Patch Tuesday
2024 May Patch Tuesday2024 May Patch Tuesday
2024 May Patch Tuesday
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Introduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptxIntroduction to FIDO Authentication and Passkeys.pptx
Introduction to FIDO Authentication and Passkeys.pptx
 
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
“Iamnobody89757” Understanding the Mysterious of Digital Identity.pdf
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 
Event-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream ProcessingEvent-Driven Architecture Masterclass: Challenges in Stream Processing
Event-Driven Architecture Masterclass: Challenges in Stream Processing
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 

Seminar: PSST challenge LREC

  • 1. Speech data augmentation for improving phoneme transcriptions of aphasic speech using wav2vec 2.0 for the PSST Challenge Birger Moëll, Jim O’Regan, Shivam Mehta, Ambika Kirkland, Harm Lameris, Joakim Gustafson, Jonas Beskow
  • 2. Experiments 2022-05-30 2 • Data augmentation: – Augmenting the original – Matching out-of-domain speech (TIMIT) • (Phonetic) language models • Voice conversion • Synthetic voices
  • 3. Data augmentations 2022-05-30 3 • Approach • We propose data augmentations and training on non- aphasiatic datasets to increase robustness and accuracy of phoneme transcription models for aphasia. • Steps: • Data augmentations using pitch shift, gaussian noise, time stretch, voice conversion and room impulse response • Joint training on non aphasiatic dataset (Common Voice, Timit) and aphasiatic dataset (PSST)
  • 4. Datasets 2022-05-30 4 • Augmenting with datasets with manual transcriptions (TIMIT) was successful • Augmenting with automatically phonetized dataset (common voice) was unsuccessful. • Acoustically aligning the TIMIT data to PSST using Room Impulse Response (RIR) improved performance Key findings ● PSST Dataset ○ Boston Naming Test - Short Form (BNT-SF) and the Verb Naming Test (VNT) ● TIMIT Dataset ○ Acoustic-Phonetic Speech Corpus ● Common voice ○ Volunteer-based open- source dataset for ML
  • 5. Models 2022-05-30 5 We compared base and large wav2vec2 models • The bigger model performed better • Training with the base model was still useful for faster experimentations
  • 6. Voice conversion 2022-05-30 6 VOICE CONVERSION WAS USED TO AUGMENT DATA BY NEURAL VOICE CLONING EXPERIMENTS WITH VOICE CLONING PROVED UNSUCCESSFUL, LIKELY BECAUSE OF PSST DATA QUALITY ISSUES
  • 7. Pitch shift 2022-05-30 7 PITCH SHIFT WAS THE MOST SUCCESSFUL PSST AUGMENTATION IMPROVING RESULTS COMPARED TO BASELINE THE PITCH WAS BOTH LOWERED AND RAISED RANDOMLY
  • 8. Gaussian Noise 2022-05-30 8 GAUSSIAN NOISE WAS USEFUL AS A REGULARIZATION TECHNIQUE A PURELY ADDITIVE TECHNIQUE FOR DATA AUGMENTATION
  • 9. Time stretch 2022-05-30 9 TIME STRETCH IS USED TO REGULARISE THE DURATION OF THE SPEECH LIMITED IMPROVEMENT OVER THE BASELINE
  • 10. Room impulse response 2022-05-30 10 TO MIMIC THE REVERBERANT CONDITIONS OF THE PSST DATA MOST SUCCESSFUL AUGMENTATION OF THE TIMIT DATA
  • 11. Augmentations on augmentations 2022-05-30 11 We experimented adding Pitch Shift + Time Stretch + Gaussian Noise + Room Impulse Response to TIMIT.
  • 12. Augmentations on augmentations 2022-05-30 12 We experimented adding Pitch Shift + Time Stretch + Gaussian Noise + Room Impulse Response to TIMIT. Because that’s how we roll.
  • 13. Language models 2022-05-30 13 WE EXPERIMENTED WITH LANGUAGE MODELS THEY LED TO LITTLE OR NO IMPROVEMENTS OVER BASELINE
  • 15. Result 2022-05-30 15 • Data augmentations improve model performance. • Increasing the size of the model decreases FER and PER. • Manually-transcribed speech from non-aphasic speakers (TIMIT) improves performance – when Room Impulse Response is used to augment the data. • The best performing model combines aphasic and non-aphasic data – 21.0% PER – 9.2% FER – relative improvement of 9.8% • Data augmentation, larger model size, and additional non-aphasic data sources can be helpful
  • 18. Challenge Results 2022-05-30 18 We came second! 😊 Of two teams 😭
  • 19. Challenge Results 2022-05-30 19 We came second! 😊 Of two teams 😭 The other team was Baidu 🙄
  • 21. Challenge Results 2022-05-30 21 Gale, R. C., Fleegle, M., Fergadiotis, G., & Bedrick, S. (2022, June). The Post- Stroke Speech Transcription (PSST) Challenge. Proceedings of The RaPID4 Workshop - Resources and ProcessIng of Linguistic, Para-Linguistic and Extra- Linguistic Data from People with Various Forms of Cognitive/Psychiatric/Developmental Impairments - within the 13th Language Resources and Evaluation Conference, 41–55. Retrieved from https://aclanthology.org/2022.rapid4-1.6