SlideShare a Scribd company logo
1 of 26
24 Feb 2014
Takuya Yoshioka
NTT CS Labs, Cambridge University
Thanks to: T. Nakatani, K. Kinoshita, M. Delcrolix (NTT)
M. Gales, X. Chen (Cambridge)
Speech Enhancement for ASR
• Effectiveness measured by WER
– use of a sensible ASR system essential
• Huge computational resources available
• Offline processing allowed
• AM can also do some job
Typical ASR System
Pron
Dict
LMAM
Recog
Engine
Speech
Enh
Front-
End
Signal Sentence
Different Approaches for Different Situations
• 1ch vs. Mch (M > 1)
• background noise;
• reverberant noise; or
• interfering talkers
Different Approaches for Different Situations
• 1ch vs. Mch (M > 1)
• background noise;
• reverberant noise; or
• interfering talkers
• Reverberation usually modelled with FIR
• Given (x[t])t=1,…,N, recover (s[t])t=1,…,N
1ch Dereverberation (Offline)
∑=
−=
T
tshtx
0
][][][
τ
ττ
Approaches
• Time domain
– subspace, Trinicon, Long-term LP
– accuate
– can account for phase distortion
• Power spectral domain
– WF, NMF
– robust against speaker movement
• Feature domain
– front-end VTS, direct CMLLR
– can leverage the AM
Dereverb
Dereverb
Analysis
Synthesis
xk(t) sk(t)
x[t] s[t]
∑=
∗
−=
T
kkk tshtx
0
)()()(
τ
ττ
...
Assume in each sub-band
Inverse Filtering (in Each Sub-band)
∑=
∗
−=
U
kkk txgts
0
)()()(
τ
ττ
Long-Term Linear Prediction
)()()()( tetxatx k
U
kkk +−= ∑∆=
∗
τ
ττ
)(tsk
∑∆=
∗
−−=
U
kkkk txatxts
τ
ττ )()()()(
we don’t minimise ek(t)!
Why LP?
)()()()( tstxatx k
U
kkk +−= ∑∆=
∗
τ
ττ ∑=
∗
−=
T
kkk tshtx
0
)()()(
τ
ττ
LP vs. FIR
( )tk
U
kkUtkk tyaNtyty ,,...,1' ,)()(~))'((|)( λτττ∑ ∆=
∗
= −
( )∑ ∑=
∆=
∗
= −=
N
t
tk
U
kkNtk tyaftyp
1
,Normal,...,1 ,)()(log))((log λτττ
+
),0(~)( ,tkk Nts λ )()()()( tstxatx k
U
kkk +−= ∑∆=
∗
τ
ττ
Interleaved Estimation of:
- LP coeff A= (ak(t))t=∆,...,U + speech variance Λ=(λk,t)t=1,...,T
- clean speech samples
Initialise A
Calculate sk(t)
Estimate LP coeffs A
Convergent?
Estimate speech vars Λ
Eval on REVERB Challenge Data Sets
System %WER
DNN AM + RNN LM + AM adapt 20.0
Dereverb + DNN AM + RNN LM + AM adapt 16.5
• prompts from 5K WSJ
• trained on multi-condition data
• tested on real recordings from dev set
• small amount of background noise
Eval on AMI Corpus (Meeting Transcription)
System
%WER
Dev Eval
DNN AM + 3gram LM 43.5 42.6
Dereverb + DNN AM + 3gram LM 42.0 41.1
• 4 participants in each meeting
• table-top microphone used
• single-speaker segments used
• severe reverberation and background noise
1ch Algorithm Summary
• very robust against modelling errors
• keys in development
– modelling the reverberation with LP
– using a reasonable clean speech pdf
Multi-Channel Extension
Dereverb BF To recogniser
• LP  MIMO LP
)()()()( ttt k
U
kkk exΑx +−= ∑∆=
∗
τ
ττ
)(tskh
• LP  MIMO LP
• single speech model  vector speech model
)()()()( ttt k
U
kkk exΑx +−= ∑∆=
∗
τ
ττ
)(tskh
),0(~)( ,tkk Nts λ ),0(~)( ,tkk Nts λ∗
hhh
),0( ,tkN λI≈
⇔
Interleaved Estimation of:
- LP matrix A= (Ak(t))t=∆,...,U + speech variance Λ=(λk,t)t=1,...,T
- clean speech samples
Initialise A
Calculate sk(t)
Estimate LP matrices A
Convergent?
Estimate speech vars Λ
Eval on REVERB Challenge Data Sets
#Mics System %WER
1
Baseline(DNN AM + RNN LM + AM adapt) 20.0
Dereverb + Baseline 16.5
2
Dereverb + Baseline 14.8
Dereverb + MVDR + Baseline 13.6
8
Dereverb + Baseline 14.0
Dereverb + MVDR + Baseline 11.3
Long-Term LP Summary
• very robust against modelling errors
• can cover both 1ch and Mch set-ups
• keys in development
– modelling the reverberation with LP
– using a reasonable clean speech pdf
Extensions Explored
• dereverberation+BSS
• adaptive long-term LP
• NMF-based dereverberation
– works in the power spectrum domain
• FE-VTS dereverberation
Dereverberation+BSS
Dereverb BSS
T60=0.3 s T60=0.5 s
0
2
4
6
8
10
12
14
16
dereverberation+separation
separation
w/oseparation
SIR(dB)
Conclusion
• Dereverberation based on long-term LP
– represents reverberation with LP
– consistent framework covering both 1ch and
Mch set-ups
– provides gains over well-optimised DNN AMs
in realistic conditions
– extensions to several directions described

More Related Content

What's hot

Overview of sampling
Overview of samplingOverview of sampling
Overview of samplingSagar Kumar
 
Slide Handouts with Notes
Slide Handouts with NotesSlide Handouts with Notes
Slide Handouts with NotesLeon Nguyen
 
EC8553 Discrete time signal processing
EC8553 Discrete time signal processing EC8553 Discrete time signal processing
EC8553 Discrete time signal processing ssuser2797e4
 
Speaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderSpeaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderAkira Tamamori
 
Non-Uniform sampling and reconstruction of multi-band signals
Non-Uniform sampling and reconstruction of multi-band signalsNon-Uniform sampling and reconstruction of multi-band signals
Non-Uniform sampling and reconstruction of multi-band signalsmravendi
 
1 AUDIO SIGNAL PROCESSING
1 AUDIO SIGNAL PROCESSING1 AUDIO SIGNAL PROCESSING
1 AUDIO SIGNAL PROCESSINGmukesh bhardwaj
 
Dsp 2018 foehu - lec 10 - multi-rate digital signal processing
Dsp 2018 foehu - lec 10 - multi-rate digital signal processingDsp 2018 foehu - lec 10 - multi-rate digital signal processing
Dsp 2018 foehu - lec 10 - multi-rate digital signal processingAmr E. Mohamed
 
Fft analysis
Fft analysisFft analysis
Fft analysisSatrious
 
SAMPLING & RECONSTRUCTION OF DISCRETE TIME SIGNAL
SAMPLING & RECONSTRUCTION  OF DISCRETE TIME SIGNALSAMPLING & RECONSTRUCTION  OF DISCRETE TIME SIGNAL
SAMPLING & RECONSTRUCTION OF DISCRETE TIME SIGNALkaran sati
 
The Fast Fourier Transform (FFT)
The Fast Fourier Transform (FFT)The Fast Fourier Transform (FFT)
The Fast Fourier Transform (FFT)Oka Danil
 
Audio Processing
Audio ProcessingAudio Processing
Audio Processinganeetaanu
 
Basics of Digital Filters
Basics of Digital FiltersBasics of Digital Filters
Basics of Digital Filtersop205
 
Aliasing and Antialiasing filter
Aliasing and Antialiasing filterAliasing and Antialiasing filter
Aliasing and Antialiasing filterSuresh Mohta
 
DSP_2018_FOEHU - Lec 06 - FIR Filter Design
DSP_2018_FOEHU - Lec 06 - FIR Filter DesignDSP_2018_FOEHU - Lec 06 - FIR Filter Design
DSP_2018_FOEHU - Lec 06 - FIR Filter DesignAmr E. Mohamed
 
Fast Fourier Transform
Fast Fourier TransformFast Fourier Transform
Fast Fourier Transformop205
 
SPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITIONSPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITIONniranjan kumar
 

What's hot (20)

Overview of sampling
Overview of samplingOverview of sampling
Overview of sampling
 
Slide Handouts with Notes
Slide Handouts with NotesSlide Handouts with Notes
Slide Handouts with Notes
 
Multrate dsp
Multrate dspMultrate dsp
Multrate dsp
 
EC8553 Discrete time signal processing
EC8553 Discrete time signal processing EC8553 Discrete time signal processing
EC8553 Discrete time signal processing
 
Speaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet VocoderSpeaker Dependent WaveNet Vocoder
Speaker Dependent WaveNet Vocoder
 
Non-Uniform sampling and reconstruction of multi-band signals
Non-Uniform sampling and reconstruction of multi-band signalsNon-Uniform sampling and reconstruction of multi-band signals
Non-Uniform sampling and reconstruction of multi-band signals
 
Multirate dtsp
Multirate dtspMultirate dtsp
Multirate dtsp
 
1 AUDIO SIGNAL PROCESSING
1 AUDIO SIGNAL PROCESSING1 AUDIO SIGNAL PROCESSING
1 AUDIO SIGNAL PROCESSING
 
Dsp 2018 foehu - lec 10 - multi-rate digital signal processing
Dsp 2018 foehu - lec 10 - multi-rate digital signal processingDsp 2018 foehu - lec 10 - multi-rate digital signal processing
Dsp 2018 foehu - lec 10 - multi-rate digital signal processing
 
Fft analysis
Fft analysisFft analysis
Fft analysis
 
SAMPLING & RECONSTRUCTION OF DISCRETE TIME SIGNAL
SAMPLING & RECONSTRUCTION  OF DISCRETE TIME SIGNALSAMPLING & RECONSTRUCTION  OF DISCRETE TIME SIGNAL
SAMPLING & RECONSTRUCTION OF DISCRETE TIME SIGNAL
 
Lecture9
Lecture9Lecture9
Lecture9
 
The Fast Fourier Transform (FFT)
The Fast Fourier Transform (FFT)The Fast Fourier Transform (FFT)
The Fast Fourier Transform (FFT)
 
Audio Processing
Audio ProcessingAudio Processing
Audio Processing
 
Basics of Digital Filters
Basics of Digital FiltersBasics of Digital Filters
Basics of Digital Filters
 
Signal Processing
Signal ProcessingSignal Processing
Signal Processing
 
Aliasing and Antialiasing filter
Aliasing and Antialiasing filterAliasing and Antialiasing filter
Aliasing and Antialiasing filter
 
DSP_2018_FOEHU - Lec 06 - FIR Filter Design
DSP_2018_FOEHU - Lec 06 - FIR Filter DesignDSP_2018_FOEHU - Lec 06 - FIR Filter Design
DSP_2018_FOEHU - Lec 06 - FIR Filter Design
 
Fast Fourier Transform
Fast Fourier TransformFast Fourier Transform
Fast Fourier Transform
 
SPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITIONSPEKER RECOGNITION UNDER LIMITED DATA CODITION
SPEKER RECOGNITION UNDER LIMITED DATA CODITION
 

Viewers also liked

Comparison of Single Channel Blind Dereverberation Methods for Speech Signals
Comparison of Single Channel Blind Dereverberation Methods for Speech SignalsComparison of Single Channel Blind Dereverberation Methods for Speech Signals
Comparison of Single Channel Blind Dereverberation Methods for Speech SignalsDeha Deniz Türköz
 
DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相Takuya Yoshioka
 
Speech Enhancement Using A Minimum Mean Square Error Short Time Spectral Ampl...
Speech Enhancement Using A Minimum Mean Square Error Short Time Spectral Ampl...Speech Enhancement Using A Minimum Mean Square Error Short Time Spectral Ampl...
Speech Enhancement Using A Minimum Mean Square Error Short Time Spectral Ampl...guestfb80e22
 
Summer Research Project. Final Presentation 2013
Summer Research Project. Final Presentation 2013Summer Research Project. Final Presentation 2013
Summer Research Project. Final Presentation 2013Ojaswa Anand
 
Speech enhancement using spectral subtraction technique with minimized cross ...
Speech enhancement using spectral subtraction technique with minimized cross ...Speech enhancement using spectral subtraction technique with minimized cross ...
Speech enhancement using spectral subtraction technique with minimized cross ...eSAT Journals
 
Active noise control
Active noise controlActive noise control
Active noise controlRishikesh .
 
Voice Activity Detection using Single Frequency Filtering
Voice Activity Detection using Single Frequency FilteringVoice Activity Detection using Single Frequency Filtering
Voice Activity Detection using Single Frequency FilteringTejus Adiga M
 
Adaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancementAdaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancementHarshal Ladhe
 
Honda presentation
Honda presentationHonda presentation
Honda presentationRahulSN
 
Data Science - Part XIII - Hidden Markov Models
Data Science - Part XIII - Hidden Markov ModelsData Science - Part XIII - Hidden Markov Models
Data Science - Part XIII - Hidden Markov ModelsDerek Kane
 

Viewers also liked (13)

Comparison of Single Channel Blind Dereverberation Methods for Speech Signals
Comparison of Single Channel Blind Dereverberation Methods for Speech SignalsComparison of Single Channel Blind Dereverberation Methods for Speech Signals
Comparison of Single Channel Blind Dereverberation Methods for Speech Signals
 
DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相DNN音響モデルにおける特徴量抽出の諸相
DNN音響モデルにおける特徴量抽出の諸相
 
Speech Enhancement Using A Minimum Mean Square Error Short Time Spectral Ampl...
Speech Enhancement Using A Minimum Mean Square Error Short Time Spectral Ampl...Speech Enhancement Using A Minimum Mean Square Error Short Time Spectral Ampl...
Speech Enhancement Using A Minimum Mean Square Error Short Time Spectral Ampl...
 
Summer Research Project. Final Presentation 2013
Summer Research Project. Final Presentation 2013Summer Research Project. Final Presentation 2013
Summer Research Project. Final Presentation 2013
 
Speech enhancement using spectral subtraction technique with minimized cross ...
Speech enhancement using spectral subtraction technique with minimized cross ...Speech enhancement using spectral subtraction technique with minimized cross ...
Speech enhancement using spectral subtraction technique with minimized cross ...
 
Active noise control
Active noise controlActive noise control
Active noise control
 
Voice Activity Detection using Single Frequency Filtering
Voice Activity Detection using Single Frequency FilteringVoice Activity Detection using Single Frequency Filtering
Voice Activity Detection using Single Frequency Filtering
 
Adaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancementAdaptive noise estimation algorithm for speech enhancement
Adaptive noise estimation algorithm for speech enhancement
 
Final ppt
Final pptFinal ppt
Final ppt
 
Antinoise system & Noise Cancellation
Antinoise system & Noise CancellationAntinoise system & Noise Cancellation
Antinoise system & Noise Cancellation
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Honda presentation
Honda presentationHonda presentation
Honda presentation
 
Data Science - Part XIII - Hidden Markov Models
Data Science - Part XIII - Hidden Markov ModelsData Science - Part XIII - Hidden Markov Models
Data Science - Part XIII - Hidden Markov Models
 

Similar to Speech enhancement for distant talking speech recognition

"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest
"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest
"Speech recognition" - Hidden Markov Models @ Papers We Love BucharestStefan Adam
 
Digital communication
Digital communicationDigital communication
Digital communicationmeashi
 
Sampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptxSampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptxHamzaJaved306957
 
Rethinking Perturbations in Encoder-Decoders for Fast Training
Rethinking Perturbations in Encoder-Decoders for Fast TrainingRethinking Perturbations in Encoder-Decoders for Fast Training
Rethinking Perturbations in Encoder-Decoders for Fast TrainingSho Takase
 
Course-Notes__Advanced-DSP.pdf
Course-Notes__Advanced-DSP.pdfCourse-Notes__Advanced-DSP.pdf
Course-Notes__Advanced-DSP.pdfShreeDevi42
 
Advanced_DSP_J_G_Proakis.pdf
Advanced_DSP_J_G_Proakis.pdfAdvanced_DSP_J_G_Proakis.pdf
Advanced_DSP_J_G_Proakis.pdfHariPrasad314745
 
Acoustic echo cancellation
Acoustic echo cancellationAcoustic echo cancellation
Acoustic echo cancellationchintanajoshi
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementNAVER Engineering
 
1 Sampling and Signal Reconstruction.pdf
1 Sampling and Signal Reconstruction.pdf1 Sampling and Signal Reconstruction.pdf
1 Sampling and Signal Reconstruction.pdfMohamedshabana38
 
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...Lushanthan Sivaneasharajah
 
Chapter 6m
Chapter 6mChapter 6m
Chapter 6mwafaa_A7
 
Radio Signal Classification with Deep Neural Networks
Radio Signal Classification with Deep Neural NetworksRadio Signal Classification with Deep Neural Networks
Radio Signal Classification with Deep Neural NetworksKachi Odoemene
 
2015 12-10 chabert
2015 12-10 chabert2015 12-10 chabert
2015 12-10 chabertSCEE Team
 

Similar to Speech enhancement for distant talking speech recognition (20)

Tdm fdm
Tdm fdmTdm fdm
Tdm fdm
 
Speech Signal Processing
Speech Signal ProcessingSpeech Signal Processing
Speech Signal Processing
 
"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest
"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest
"Speech recognition" - Hidden Markov Models @ Papers We Love Bucharest
 
Digital communication
Digital communicationDigital communication
Digital communication
 
Sampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptxSampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptx
 
PS
PSPS
PS
 
Rethinking Perturbations in Encoder-Decoders for Fast Training
Rethinking Perturbations in Encoder-Decoders for Fast TrainingRethinking Perturbations in Encoder-Decoders for Fast Training
Rethinking Perturbations in Encoder-Decoders for Fast Training
 
Techfest jan17
Techfest jan17Techfest jan17
Techfest jan17
 
Course-Notes__Advanced-DSP.pdf
Course-Notes__Advanced-DSP.pdfCourse-Notes__Advanced-DSP.pdf
Course-Notes__Advanced-DSP.pdf
 
Advanced_DSP_J_G_Proakis.pdf
Advanced_DSP_J_G_Proakis.pdfAdvanced_DSP_J_G_Proakis.pdf
Advanced_DSP_J_G_Proakis.pdf
 
Acoustic echo cancellation
Acoustic echo cancellationAcoustic echo cancellation
Acoustic echo cancellation
 
Deep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech EnhancementDeep Learning Based Voice Activity Detection and Speech Enhancement
Deep Learning Based Voice Activity Detection and Speech Enhancement
 
auditory model
auditory modelauditory model
auditory model
 
Icmmse slides
Icmmse slidesIcmmse slides
Icmmse slides
 
unit 11.ppt
unit 11.pptunit 11.ppt
unit 11.ppt
 
1 Sampling and Signal Reconstruction.pdf
1 Sampling and Signal Reconstruction.pdf1 Sampling and Signal Reconstruction.pdf
1 Sampling and Signal Reconstruction.pdf
 
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
Application of Fisher Linear Discriminant Analysis to Speech/Music Classifica...
 
Chapter 6m
Chapter 6mChapter 6m
Chapter 6m
 
Radio Signal Classification with Deep Neural Networks
Radio Signal Classification with Deep Neural NetworksRadio Signal Classification with Deep Neural Networks
Radio Signal Classification with Deep Neural Networks
 
2015 12-10 chabert
2015 12-10 chabert2015 12-10 chabert
2015 12-10 chabert
 

Recently uploaded

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Recently uploaded (20)

Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Speech enhancement for distant talking speech recognition

  • 1. 24 Feb 2014 Takuya Yoshioka NTT CS Labs, Cambridge University Thanks to: T. Nakatani, K. Kinoshita, M. Delcrolix (NTT) M. Gales, X. Chen (Cambridge)
  • 2. Speech Enhancement for ASR • Effectiveness measured by WER – use of a sensible ASR system essential • Huge computational resources available • Offline processing allowed • AM can also do some job
  • 4. Different Approaches for Different Situations • 1ch vs. Mch (M > 1) • background noise; • reverberant noise; or • interfering talkers
  • 5. Different Approaches for Different Situations • 1ch vs. Mch (M > 1) • background noise; • reverberant noise; or • interfering talkers
  • 6. • Reverberation usually modelled with FIR • Given (x[t])t=1,…,N, recover (s[t])t=1,…,N 1ch Dereverberation (Offline) ∑= −= T tshtx 0 ][][][ τ ττ
  • 7. Approaches • Time domain – subspace, Trinicon, Long-term LP – accuate – can account for phase distortion • Power spectral domain – WF, NMF – robust against speaker movement • Feature domain – front-end VTS, direct CMLLR – can leverage the AM
  • 8. Dereverb Dereverb Analysis Synthesis xk(t) sk(t) x[t] s[t] ∑= ∗ −= T kkk tshtx 0 )()()( τ ττ ... Assume in each sub-band
  • 9. Inverse Filtering (in Each Sub-band) ∑= ∗ −= U kkk txgts 0 )()()( τ ττ
  • 10. Long-Term Linear Prediction )()()()( tetxatx k U kkk +−= ∑∆= ∗ τ ττ )(tsk ∑∆= ∗ −−= U kkkk txatxts τ ττ )()()()( we don’t minimise ek(t)!
  • 11. Why LP? )()()()( tstxatx k U kkk +−= ∑∆= ∗ τ ττ ∑= ∗ −= T kkk tshtx 0 )()()( τ ττ LP vs. FIR
  • 12. ( )tk U kkUtkk tyaNtyty ,,...,1' ,)()(~))'((|)( λτττ∑ ∆= ∗ = − ( )∑ ∑= ∆= ∗ = −= N t tk U kkNtk tyaftyp 1 ,Normal,...,1 ,)()(log))((log λτττ + ),0(~)( ,tkk Nts λ )()()()( tstxatx k U kkk +−= ∑∆= ∗ τ ττ
  • 13. Interleaved Estimation of: - LP coeff A= (ak(t))t=∆,...,U + speech variance Λ=(λk,t)t=1,...,T - clean speech samples Initialise A Calculate sk(t) Estimate LP coeffs A Convergent? Estimate speech vars Λ
  • 14. Eval on REVERB Challenge Data Sets System %WER DNN AM + RNN LM + AM adapt 20.0 Dereverb + DNN AM + RNN LM + AM adapt 16.5 • prompts from 5K WSJ • trained on multi-condition data • tested on real recordings from dev set • small amount of background noise
  • 15. Eval on AMI Corpus (Meeting Transcription) System %WER Dev Eval DNN AM + 3gram LM 43.5 42.6 Dereverb + DNN AM + 3gram LM 42.0 41.1 • 4 participants in each meeting • table-top microphone used • single-speaker segments used • severe reverberation and background noise
  • 16. 1ch Algorithm Summary • very robust against modelling errors • keys in development – modelling the reverberation with LP – using a reasonable clean speech pdf
  • 18. • LP  MIMO LP )()()()( ttt k U kkk exΑx +−= ∑∆= ∗ τ ττ )(tskh
  • 19. • LP  MIMO LP • single speech model  vector speech model )()()()( ttt k U kkk exΑx +−= ∑∆= ∗ τ ττ )(tskh ),0(~)( ,tkk Nts λ ),0(~)( ,tkk Nts λ∗ hhh ),0( ,tkN λI≈ ⇔
  • 20. Interleaved Estimation of: - LP matrix A= (Ak(t))t=∆,...,U + speech variance Λ=(λk,t)t=1,...,T - clean speech samples Initialise A Calculate sk(t) Estimate LP matrices A Convergent? Estimate speech vars Λ
  • 21. Eval on REVERB Challenge Data Sets #Mics System %WER 1 Baseline(DNN AM + RNN LM + AM adapt) 20.0 Dereverb + Baseline 16.5 2 Dereverb + Baseline 14.8 Dereverb + MVDR + Baseline 13.6 8 Dereverb + Baseline 14.0 Dereverb + MVDR + Baseline 11.3
  • 22. Long-Term LP Summary • very robust against modelling errors • can cover both 1ch and Mch set-ups • keys in development – modelling the reverberation with LP – using a reasonable clean speech pdf
  • 23. Extensions Explored • dereverberation+BSS • adaptive long-term LP • NMF-based dereverberation – works in the power spectrum domain • FE-VTS dereverberation
  • 25. T60=0.3 s T60=0.5 s 0 2 4 6 8 10 12 14 16 dereverberation+separation separation w/oseparation SIR(dB)
  • 26. Conclusion • Dereverberation based on long-term LP – represents reverberation with LP – consistent framework covering both 1ch and Mch set-ups – provides gains over well-optimised DNN AMs in realistic conditions – extensions to several directions described