SlideShare a Scribd company logo
1 of 16
Speech data augmentation for improving
phoneme transcriptions of aphasic speech using
wav2vec 2.0 for the PSST Challenge
Birger Moëll, Jim O’Regan, Shivam Mehta, Ambika Kirkland, Harm Lameris,
Joakim Gustafson, Jonas Beskow
Experiments
2022-05-30 2
• Data augmentation:
– Augmenting the original
– Matching out-of-domain speech (TIMIT)
• (Phonetic) language models
• Voice conversion
• Synthetic voices
Data augmentations
2022-05-30 3
• Approach
• We propose data augmentations and training on non-
aphasiatic datasets to increase robustness and
accuracy of phoneme transcription models for aphasia.
• Steps:
• Data augmentations using pitch shift, gaussian
noise, time stretch, voice conversion and room
impulse response
• Joint training on non aphasiatic dataset (Common
Voice, Timit) and aphasiatic dataset (PSST)
Datasets
2022-05-30 4
• Augmenting with datasets with
manual transcriptions (TIMIT) was
successful
• Augmenting with automatically
phonetized dataset (common voice)
was unsuccessful.
• Acoustically aligning the TIMIT data to
PSST using Room Impulse
Response (RIR) improved
performance
Key findings
● PSST Dataset
○ Boston Naming Test - Short
Form (BNT-SF) and the Verb
Naming Test (VNT)
● TIMIT Dataset
○ Acoustic-Phonetic Speech
Corpus
● Common voice
○ Volunteer-based open-
source dataset for ML
Models
2022-05-30 5
We compared base and large
wav2vec2 models
• The bigger model performed better
• Training with the base model was still
useful for faster experimentations
Voice conversion
2022-05-30 6
VOICE CONVERSION WAS USED TO
AUGMENT DATA BY NEURAL VOICE
CLONING
EXPERIMENTS WITH VOICE CLONING
PROVED UNSUCCESSFUL, LIKELY
BECAUSE OF PSST DATA QUALITY
ISSUES
Pitch shift
2022-05-30 7
PITCH SHIFT WAS THE MOST
SUCCESSFUL PSST AUGMENTATION
IMPROVING RESULTS COMPARED TO
BASELINE
THE PITCH WAS BOTH LOWERED AND
RAISED RANDOMLY
Gaussian Noise
2022-05-30 8
GAUSSIAN NOISE WAS
USEFUL AS A
REGULARIZATION
TECHNIQUE
A PURELY ADDITIVE
TECHNIQUE FOR DATA
AUGMENTATION
Time stretch
2022-05-30 9
TIME STRETCH IS USED TO
REGULARISE THE DURATION
OF THE SPEECH
LIMITED IMPROVEMENT OVER
THE BASELINE
Room impulse response
2022-05-30 10
TO MIMIC THE REVERBERANT
CONDITIONS OF THE PSST
DATA
MOST SUCCESSFUL
AUGMENTATION OF THE TIMIT
DATA
Augmentations on augmentations
2022-05-30 11
We experimented adding
Pitch Shift + Time Stretch +
Gaussian Noise + Room
Impulse Response to
TIMIT.
Augmentations on augmentations
2022-05-30 12
We experimented adding
Pitch Shift + Time Stretch +
Gaussian Noise + Room
Impulse Response to
TIMIT.
Because that’s how we roll.
Language models
2022-05-30 13
WE EXPERIMENTED WITH
LANGUAGE MODELS
THEY LED TO LITTLE OR NO
IMPROVEMENTS OVER
BASELINE
Results
2022-05-30 14
Result
2022-05-30 15
• Data augmentations improve model performance.
• Increasing the size of the model decreases FER and PER.
• Manually-transcribed speech from non-aphasic speakers (TIMIT) improves
performance
– when Room Impulse Response is used to augment the data.
• The best performing model combines aphasic and non-aphasic data
– 21.0% PER
– 9.2% FER
– relative improvement of 9.8%
• Data augmentation, larger model size, and additional non-aphasic data
sources can be helpful
Thank you!
2022-05-30 16

More Related Content

Recently uploaded

Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
FIDO Alliance
 

Recently uploaded (20)

Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdfFrisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
Frisco Automating Purchase Orders with MuleSoft IDP- May 10th, 2024.pptx.pdf
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
WebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM PerformanceWebAssembly is Key to Better LLM Performance
WebAssembly is Key to Better LLM Performance
 
Oauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoftOauth 2.0 Introduction and Flows with MuleSoft
Oauth 2.0 Introduction and Flows with MuleSoft
 
Google I/O Extended 2024 Warsaw
Google I/O Extended 2024 WarsawGoogle I/O Extended 2024 Warsaw
Google I/O Extended 2024 Warsaw
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider  Progress from Awareness to Implementation.pptxTales from a Passkey Provider  Progress from Awareness to Implementation.pptx
Tales from a Passkey Provider Progress from Awareness to Implementation.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!State of the Smart Building Startup Landscape 2024!
State of the Smart Building Startup Landscape 2024!
 
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on ThanabotsContinuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
Continuing Bonds Through AI: A Hermeneutic Reflection on Thanabots
 
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties ReimaginedEasier, Faster, and More Powerful – Notes Document Properties Reimagined
Easier, Faster, and More Powerful – Notes Document Properties Reimagined
 
How to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in PakistanHow to Check GPS Location with a Live Tracker in Pakistan
How to Check GPS Location with a Live Tracker in Pakistan
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
ADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptxADP Passwordless Journey Case Study.pptx
ADP Passwordless Journey Case Study.pptx
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptxCyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
Cyber Insurance - RalphGilot - Embry-Riddle Aeronautical University.pptx
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 

Featured

How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
ThinkNow
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

PSST challenge LREC

  • 1. Speech data augmentation for improving phoneme transcriptions of aphasic speech using wav2vec 2.0 for the PSST Challenge Birger Moëll, Jim O’Regan, Shivam Mehta, Ambika Kirkland, Harm Lameris, Joakim Gustafson, Jonas Beskow
  • 2. Experiments 2022-05-30 2 • Data augmentation: – Augmenting the original – Matching out-of-domain speech (TIMIT) • (Phonetic) language models • Voice conversion • Synthetic voices
  • 3. Data augmentations 2022-05-30 3 • Approach • We propose data augmentations and training on non- aphasiatic datasets to increase robustness and accuracy of phoneme transcription models for aphasia. • Steps: • Data augmentations using pitch shift, gaussian noise, time stretch, voice conversion and room impulse response • Joint training on non aphasiatic dataset (Common Voice, Timit) and aphasiatic dataset (PSST)
  • 4. Datasets 2022-05-30 4 • Augmenting with datasets with manual transcriptions (TIMIT) was successful • Augmenting with automatically phonetized dataset (common voice) was unsuccessful. • Acoustically aligning the TIMIT data to PSST using Room Impulse Response (RIR) improved performance Key findings ● PSST Dataset ○ Boston Naming Test - Short Form (BNT-SF) and the Verb Naming Test (VNT) ● TIMIT Dataset ○ Acoustic-Phonetic Speech Corpus ● Common voice ○ Volunteer-based open- source dataset for ML
  • 5. Models 2022-05-30 5 We compared base and large wav2vec2 models • The bigger model performed better • Training with the base model was still useful for faster experimentations
  • 6. Voice conversion 2022-05-30 6 VOICE CONVERSION WAS USED TO AUGMENT DATA BY NEURAL VOICE CLONING EXPERIMENTS WITH VOICE CLONING PROVED UNSUCCESSFUL, LIKELY BECAUSE OF PSST DATA QUALITY ISSUES
  • 7. Pitch shift 2022-05-30 7 PITCH SHIFT WAS THE MOST SUCCESSFUL PSST AUGMENTATION IMPROVING RESULTS COMPARED TO BASELINE THE PITCH WAS BOTH LOWERED AND RAISED RANDOMLY
  • 8. Gaussian Noise 2022-05-30 8 GAUSSIAN NOISE WAS USEFUL AS A REGULARIZATION TECHNIQUE A PURELY ADDITIVE TECHNIQUE FOR DATA AUGMENTATION
  • 9. Time stretch 2022-05-30 9 TIME STRETCH IS USED TO REGULARISE THE DURATION OF THE SPEECH LIMITED IMPROVEMENT OVER THE BASELINE
  • 10. Room impulse response 2022-05-30 10 TO MIMIC THE REVERBERANT CONDITIONS OF THE PSST DATA MOST SUCCESSFUL AUGMENTATION OF THE TIMIT DATA
  • 11. Augmentations on augmentations 2022-05-30 11 We experimented adding Pitch Shift + Time Stretch + Gaussian Noise + Room Impulse Response to TIMIT.
  • 12. Augmentations on augmentations 2022-05-30 12 We experimented adding Pitch Shift + Time Stretch + Gaussian Noise + Room Impulse Response to TIMIT. Because that’s how we roll.
  • 13. Language models 2022-05-30 13 WE EXPERIMENTED WITH LANGUAGE MODELS THEY LED TO LITTLE OR NO IMPROVEMENTS OVER BASELINE
  • 15. Result 2022-05-30 15 • Data augmentations improve model performance. • Increasing the size of the model decreases FER and PER. • Manually-transcribed speech from non-aphasic speakers (TIMIT) improves performance – when Room Impulse Response is used to augment the data. • The best performing model combines aphasic and non-aphasic data – 21.0% PER – 9.2% FER – relative improvement of 9.8% • Data augmentation, larger model size, and additional non-aphasic data sources can be helpful