1/35
Introduction
Traditional techniques
Sparse Coding
Deep Neural Network
Evaluation
Conclusion
A literature review on improving speech
intelligibility in noisy environment
Tuan Anh Dinh
Oregon Health & Science University
November 20, 2018
Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
2/35
Introduction
Traditional techniques
Sparse Coding
Deep Neural Network
Evaluation
Conclusion
Introduction
Speech intelligibility is degraded in noisy environment
Reducing background noise benefits hearing impaired (HI)
people, and hearing aid (HA) users. (Kochkin 2000),
(Nabeleck 2006)
Use single-channel speech-enhancement algorithms for
noise-suppression.
Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
3/35
Introduction
Traditional techniques
Sparse Coding
Deep Neural Network
Evaluation
Conclusion
Speech-enhancement algorithms
Assume additive model of noise:
y(n) = s(n) + d(n)
y(n) is noisy speech, s(n) is clean speech, and d(n) is an
uncorrelated noise
Modify/filter short-time spectral amplitude (STSA) of
degraded speech y(n)
Keep degraded speech’s phase
Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
4/35
Introduction
Traditional techniques
Sparse Coding
Deep Neural Network
Evaluation
Conclusion
Speech-enhancement algorithms
Traditional techniques:
Direct estimation of STSA: Power Spectrum Subtraction
(Boll, 1979)
Filtering of STSA: Wiener Filtering (Lim 1979),
Ideal Binary Masking (IBM) (Wang, 2008)
Little improvement with stationary noise (Monaghan 2017)
Machine learning techniques:
Sparse Coding (Monaghan 2017)
GMMs (Hu and Loizou 2010) or DNNs (Healy 2013, 2015) for
IBM.
DNN (Monaghan 2017) for Wiener Filtering.
The machine learning techniques such as Deep neural network
(DNN) can generalize well with noise (May 2014, Chen 2016)
Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
5/35
Introduction
Traditional techniques
Sparse Coding
Deep Neural Network
Evaluation
Conclusion
I will talk about
Traditional techniques:
Power Spectrum Subtraction
Wiener Filtering
Ideal Binary Masking (IBM)
Machine learning techniques:
Sparse Coding
DNN for Wiener Filtering.
Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
6/35
Introduction
Traditional techniques
Sparse Coding
Deep Neural Network
Evaluation
Conclusion
Power Spectrum Subtraction
Wiener Filtering
Ideal Binary Masking
Power Spectrum Subtraction
Calculate power spectrum of enhanced speech |ˆS(f )|2:
|ˆS(f )|2
= |Y (f )|2
−E |D(f )|2
where |Y (f )|2 is the power spectrum of degraded speech,
|D(f )|2 is the power spectrum of noise
Obtain E |D(f )|2 either from assumed properties of noise or
actual measurement from background noise in no-speech
intervals
When right-hand side is negative, set it to zero
Little improvement with stationary noise.
Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
Workflow of Power Spectrum Subtraction
Figure: A typical speech enhancement system based on power spectrum
subtraction
7/35
Power Spectrum Subtraction
Figure: a speech degraded by a narrowband noise (top), and the result of
spectral subtraction (bottom). Source: the Internet
8/35
9/35
Introduction
Traditional techniques
Sparse Coding
Deep Neural Network
Evaluation
Conclusion
Power Spectrum Subtraction
Wiener Filtering
Ideal Binary Masking
Wiener Filtering
Minimize the error E |s(n) − ˆs(n)|2 , we obtain Wiener
filter’s gain H(f )
H(f ) =
E |S(f )|2
E [|S(f )|2] + E [|D(f )|2]
where |S(f )|2 is the power spectrum of clean speech, |D(f )|2
is the power spectrum of noise, s(n) is clean signal and ˆs(n) is
enhanced signal.
The mean squared error (MSE) criterion is not strongly
correlated with perception (Lim 1979)
Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
Wiener Filtering
Figure: estimate of clean signal after spectral subtraction (top) and signal
filtered by Wiener filter (bottom)
10/35
11/35
Introduction
Traditional techniques
Sparse Coding
Deep Neural Network
Evaluation
Conclusion
Power Spectrum Subtraction
Wiener Filtering
Ideal Binary Masking
Ideal Binary Masking
Idea: Find speech-dominated time-frequency bins (1) and
noise-dominated time-frequency bins (0)
Given clean speech and noise (Ideal), construct IBM from
time-frequency representation of speech
Set the value of the the IBM to 0 or 1 by comparing SNR in
each time-frequency bin against a preset threshold.
Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
Ideal Binary Masking
Figure: (A) 32-channel cochlea-gram of clean speech, (B) 32-channel of
speech-shaped noise, (C) IBM with 32 channels, (D)32-channel
cochlea-gram of gated noise by IBM
12/35
13/35
Introduction
Traditional techniques
Sparse Coding
Deep Neural Network
Evaluation
Conclusion
Sparse coding
Assume clean speech’s feature vector s
s = Dα
where D is a dictionary and α is a sparse coefficient vector
enhanced speech
ˆs = D ˆα
such that
∥y − D ˆα∥2
2< ϵ
with ϵ is the desired error.
De-noising degraded feature vector y means making its ˆα
sparse.
Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
Some vectors in dictionary D
Figure: Basic vectors in Dictionary obtained from clean speech 14/35
Sparse coding
Figure: Spectrograms of clean speech (top), noisy speech with babble
noise (middle), and enhanced speech using sparse coding (bottom)
15/35
16/35
Introduction
Traditional techniques
Sparse Coding
Deep Neural Network
Evaluation
Conclusion
Deep Neural Network Input
Deep Neural Network Output
Deep Neural Network Framework
Deep neural network
DNNs can represent complex mapping function between input data
and output data
Figure: ANN and DNN structures
Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
17/35
Introduction
Traditional techniques
Sparse Coding
Deep Neural Network
Evaluation
Conclusion
Deep Neural Network Input
Deep Neural Network Output
Deep Neural Network Framework
Deep neural network—Input
Traditional features vs Auditory image model features
Traditional feature set:
Amplitude modulation spectrum (AMS) (Tchorz, 2003)
Perceptual linear prediction (PLP) (Hermansky, 1994)
Mel-frequency cepstra coefficients (MFCC)
Auditory image model (AIM), in our paper AIM has 2 steps
1 basilar membrane motion (BMM)
2 size-shape transformed auditory (SSI)
Gamma-tone filter bank use equivalent rectangular bandwidth
(ERB) scale: ERB = 24.7 × (4.37 × 10−3f + 1)
Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
Amplitude modulation spectrum (AMS)
Log amplitude of the Fourier transform of the Time-frequency
representation (e.g. log filter-banks) of speech.
Figure: AMS patterns generated from voice speech segment (top), and
from speech shaped noise (bottom). Bright and dark areas indicate high
and low energies, respectively.
18/35
19/35
Introduction
Traditional techniques
Sparse Coding
Deep Neural Network
Evaluation
Conclusion
Deep Neural Network Input
Deep Neural Network Output
Deep Neural Network Framework
PLP and MFCC
Relative spectral transform and Perceptual linear prediction
(RASTA-PLP) (Hermansky, 1994)
Estimate smooth spectral envelope using linear predictive code
Integrate auditory knowledge through Bark scale:
Bark = 13 arctan(0.00076
f ) + 3.5 arctan(( f
7500 )2
)
Mel-frequency cepstra coefficients (MFCC)
Estimate smooth spectral envelope using cepstra coefficients
Integrate auditory knowledge through Mel scale:
Mel = 2595 log10 1 + f
700
Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
An example of PLP
Figure: Spectrogram (top), PLP approximation (bottom)
20/35
An example of MFCC
Figure: Spectrogram (left), MFCC approximation (right)
21/35
22/35
Introduction
Traditional techniques
Sparse Coding
Deep Neural Network
Evaluation
Conclusion
Deep Neural Network Input
Deep Neural Network Output
Deep Neural Network Framework
Auditory image model
Basilar membrane motion (BMM)
simulates how sound waves cause basilar membrane motion
(BMM) in the human cochlea
Size-shape transformed auditory image (SSI)
produce same pattern for vowels spoken by speakers with
different glottal pulse rates or vocal tract lengths
Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
Basilar membrane motion
Figure: Basilar membrane response to vowel /ae/
23/35
24/35
Introduction
Traditional techniques
Sparse Coding
Deep Neural Network
Evaluation
Conclusion
Deep Neural Network Input
Deep Neural Network Output
Deep Neural Network Framework
Deep Neural Network—Output
Output: Wiener filter’s gain (Monaghan 2017) / IBM (Healy
2013, 2015)
Loss functions:
Use Short Time Objective Intelligibility (STOI) (Monaghan
2017)
Use Normalized Covariance Metric (NCM) (Monaghan 2017)
Use hit rates (HIT): percentage of speech-dominated
time-frequency bins correctly classified by binary mask (Healy
2013, 2015), (Monaghan 2017)
Use false alarms (FA): percentage of noise-dominated
time-frequency bins incorrectly classified as speech-dominated
(Healy 2013, 2015), (Monaghan 2017)
Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
25/35
Introduction
Traditional techniques
Sparse Coding
Deep Neural Network
Evaluation
Conclusion
Deep Neural Network Input
Deep Neural Network Output
Deep Neural Network Framework
Deep Neural Network—Framework
Figure: DNN-based speech enhancement. NN_COMP is a DNN trained
on traditional feature. NN_AIM is a DNN trained on AIM
Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
26/35
Introduction
Traditional techniques
Sparse Coding
Deep Neural Network
Evaluation
Conclusion
Objective evaluation
Subjective evaluation
HIT-FA
Figure: HIT-FA scores for neural network based algorithms. To calculate
HIT-FA scores, the ratio masks (estimated and ideal) were converted to
binary masks
Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
27/35
Introduction
Traditional techniques
Sparse Coding
Deep Neural Network
Evaluation
Conclusion
Objective evaluation
Subjective evaluation
Normalized Covariance Metric
Figure: Mean values of NCM for the sentences used in each condition.
Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
28/35
Introduction
Traditional techniques
Sparse Coding
Deep Neural Network
Evaluation
Conclusion
Objective evaluation
Subjective evaluation
Short Time Objective Intelligibility
Figure: Mean values of STOI for the sentences used in each condition.
Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
29/35
Introduction
Traditional techniques
Sparse Coding
Deep Neural Network
Evaluation
Conclusion
Objective evaluation
Subjective evaluation
Speech intelligibility
Use speech recognition scores as an subjective measurement.
Assessed as percentage of keywords identified correctly in
sentences spoken by a British male
17 participants are native speakers of British English
2 kinds of noise: speech-shaped noise (SSN) and babble noise.
2 SNR conditions: 0 dB and +4 dB
Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
Subjective Intelligibility
Figure: Percentage of key words correctly recognized for each algorithm
in speech-shaped noise (SSN) and babble noise of 5 systems
30/35
31/35
Introduction
Traditional techniques
Sparse Coding
Deep Neural Network
Evaluation
Conclusion
Objective evaluation
Subjective evaluation
Speech quality
Same participants from speech intelligibility test.
Ask participant to rate the perceived quality of the speech on
a scale from 0 to 7. (0 is bad, and 7 is excellent)
Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
Speech quality
Figure: speech quality rating 32/35
Speech intelligibility gain vs speech quality gain
Figure: Group-mean improvement in speech quality versus improvement
in speech intelligibility for the four algorithms in each noise condition.
33/35
Correlation between Objective and Subjective measurement
Figure: Average values of the objective measures NCM and STOI plotted
as a function of the mean intelligibility scores
34/35
35/35
Introduction
Traditional techniques
Sparse Coding
Deep Neural Network
Evaluation
Conclusion
Conclusion
Significant improvement of speech recognition score and
quality for 3 machine learning algorithms in at least one of
four conditions
No significant improvement for Wiener filtering
For DNNs, auditory features perform better than traditional
features, but not significant
DNNs perform better than sparse coding
Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en

A literature review on improving speech intelligibility in noisy environment

  • 1.
    1/35 Introduction Traditional techniques Sparse Coding DeepNeural Network Evaluation Conclusion A literature review on improving speech intelligibility in noisy environment Tuan Anh Dinh Oregon Health & Science University November 20, 2018 Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
  • 2.
    2/35 Introduction Traditional techniques Sparse Coding DeepNeural Network Evaluation Conclusion Introduction Speech intelligibility is degraded in noisy environment Reducing background noise benefits hearing impaired (HI) people, and hearing aid (HA) users. (Kochkin 2000), (Nabeleck 2006) Use single-channel speech-enhancement algorithms for noise-suppression. Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
  • 3.
    3/35 Introduction Traditional techniques Sparse Coding DeepNeural Network Evaluation Conclusion Speech-enhancement algorithms Assume additive model of noise: y(n) = s(n) + d(n) y(n) is noisy speech, s(n) is clean speech, and d(n) is an uncorrelated noise Modify/filter short-time spectral amplitude (STSA) of degraded speech y(n) Keep degraded speech’s phase Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
  • 4.
    4/35 Introduction Traditional techniques Sparse Coding DeepNeural Network Evaluation Conclusion Speech-enhancement algorithms Traditional techniques: Direct estimation of STSA: Power Spectrum Subtraction (Boll, 1979) Filtering of STSA: Wiener Filtering (Lim 1979), Ideal Binary Masking (IBM) (Wang, 2008) Little improvement with stationary noise (Monaghan 2017) Machine learning techniques: Sparse Coding (Monaghan 2017) GMMs (Hu and Loizou 2010) or DNNs (Healy 2013, 2015) for IBM. DNN (Monaghan 2017) for Wiener Filtering. The machine learning techniques such as Deep neural network (DNN) can generalize well with noise (May 2014, Chen 2016) Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
  • 5.
    5/35 Introduction Traditional techniques Sparse Coding DeepNeural Network Evaluation Conclusion I will talk about Traditional techniques: Power Spectrum Subtraction Wiener Filtering Ideal Binary Masking (IBM) Machine learning techniques: Sparse Coding DNN for Wiener Filtering. Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
  • 6.
    6/35 Introduction Traditional techniques Sparse Coding DeepNeural Network Evaluation Conclusion Power Spectrum Subtraction Wiener Filtering Ideal Binary Masking Power Spectrum Subtraction Calculate power spectrum of enhanced speech |ˆS(f )|2: |ˆS(f )|2 = |Y (f )|2 −E |D(f )|2 where |Y (f )|2 is the power spectrum of degraded speech, |D(f )|2 is the power spectrum of noise Obtain E |D(f )|2 either from assumed properties of noise or actual measurement from background noise in no-speech intervals When right-hand side is negative, set it to zero Little improvement with stationary noise. Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
  • 7.
    Workflow of PowerSpectrum Subtraction Figure: A typical speech enhancement system based on power spectrum subtraction 7/35
  • 8.
    Power Spectrum Subtraction Figure:a speech degraded by a narrowband noise (top), and the result of spectral subtraction (bottom). Source: the Internet 8/35
  • 9.
    9/35 Introduction Traditional techniques Sparse Coding DeepNeural Network Evaluation Conclusion Power Spectrum Subtraction Wiener Filtering Ideal Binary Masking Wiener Filtering Minimize the error E |s(n) − ˆs(n)|2 , we obtain Wiener filter’s gain H(f ) H(f ) = E |S(f )|2 E [|S(f )|2] + E [|D(f )|2] where |S(f )|2 is the power spectrum of clean speech, |D(f )|2 is the power spectrum of noise, s(n) is clean signal and ˆs(n) is enhanced signal. The mean squared error (MSE) criterion is not strongly correlated with perception (Lim 1979) Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
  • 10.
    Wiener Filtering Figure: estimateof clean signal after spectral subtraction (top) and signal filtered by Wiener filter (bottom) 10/35
  • 11.
    11/35 Introduction Traditional techniques Sparse Coding DeepNeural Network Evaluation Conclusion Power Spectrum Subtraction Wiener Filtering Ideal Binary Masking Ideal Binary Masking Idea: Find speech-dominated time-frequency bins (1) and noise-dominated time-frequency bins (0) Given clean speech and noise (Ideal), construct IBM from time-frequency representation of speech Set the value of the the IBM to 0 or 1 by comparing SNR in each time-frequency bin against a preset threshold. Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
  • 12.
    Ideal Binary Masking Figure:(A) 32-channel cochlea-gram of clean speech, (B) 32-channel of speech-shaped noise, (C) IBM with 32 channels, (D)32-channel cochlea-gram of gated noise by IBM 12/35
  • 13.
    13/35 Introduction Traditional techniques Sparse Coding DeepNeural Network Evaluation Conclusion Sparse coding Assume clean speech’s feature vector s s = Dα where D is a dictionary and α is a sparse coefficient vector enhanced speech ˆs = D ˆα such that ∥y − D ˆα∥2 2< ϵ with ϵ is the desired error. De-noising degraded feature vector y means making its ˆα sparse. Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
  • 14.
    Some vectors indictionary D Figure: Basic vectors in Dictionary obtained from clean speech 14/35
  • 15.
    Sparse coding Figure: Spectrogramsof clean speech (top), noisy speech with babble noise (middle), and enhanced speech using sparse coding (bottom) 15/35
  • 16.
    16/35 Introduction Traditional techniques Sparse Coding DeepNeural Network Evaluation Conclusion Deep Neural Network Input Deep Neural Network Output Deep Neural Network Framework Deep neural network DNNs can represent complex mapping function between input data and output data Figure: ANN and DNN structures Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
  • 17.
    17/35 Introduction Traditional techniques Sparse Coding DeepNeural Network Evaluation Conclusion Deep Neural Network Input Deep Neural Network Output Deep Neural Network Framework Deep neural network—Input Traditional features vs Auditory image model features Traditional feature set: Amplitude modulation spectrum (AMS) (Tchorz, 2003) Perceptual linear prediction (PLP) (Hermansky, 1994) Mel-frequency cepstra coefficients (MFCC) Auditory image model (AIM), in our paper AIM has 2 steps 1 basilar membrane motion (BMM) 2 size-shape transformed auditory (SSI) Gamma-tone filter bank use equivalent rectangular bandwidth (ERB) scale: ERB = 24.7 × (4.37 × 10−3f + 1) Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
  • 18.
    Amplitude modulation spectrum(AMS) Log amplitude of the Fourier transform of the Time-frequency representation (e.g. log filter-banks) of speech. Figure: AMS patterns generated from voice speech segment (top), and from speech shaped noise (bottom). Bright and dark areas indicate high and low energies, respectively. 18/35
  • 19.
    19/35 Introduction Traditional techniques Sparse Coding DeepNeural Network Evaluation Conclusion Deep Neural Network Input Deep Neural Network Output Deep Neural Network Framework PLP and MFCC Relative spectral transform and Perceptual linear prediction (RASTA-PLP) (Hermansky, 1994) Estimate smooth spectral envelope using linear predictive code Integrate auditory knowledge through Bark scale: Bark = 13 arctan(0.00076 f ) + 3.5 arctan(( f 7500 )2 ) Mel-frequency cepstra coefficients (MFCC) Estimate smooth spectral envelope using cepstra coefficients Integrate auditory knowledge through Mel scale: Mel = 2595 log10 1 + f 700 Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
  • 20.
    An example ofPLP Figure: Spectrogram (top), PLP approximation (bottom) 20/35
  • 21.
    An example ofMFCC Figure: Spectrogram (left), MFCC approximation (right) 21/35
  • 22.
    22/35 Introduction Traditional techniques Sparse Coding DeepNeural Network Evaluation Conclusion Deep Neural Network Input Deep Neural Network Output Deep Neural Network Framework Auditory image model Basilar membrane motion (BMM) simulates how sound waves cause basilar membrane motion (BMM) in the human cochlea Size-shape transformed auditory image (SSI) produce same pattern for vowels spoken by speakers with different glottal pulse rates or vocal tract lengths Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
  • 23.
    Basilar membrane motion Figure:Basilar membrane response to vowel /ae/ 23/35
  • 24.
    24/35 Introduction Traditional techniques Sparse Coding DeepNeural Network Evaluation Conclusion Deep Neural Network Input Deep Neural Network Output Deep Neural Network Framework Deep Neural Network—Output Output: Wiener filter’s gain (Monaghan 2017) / IBM (Healy 2013, 2015) Loss functions: Use Short Time Objective Intelligibility (STOI) (Monaghan 2017) Use Normalized Covariance Metric (NCM) (Monaghan 2017) Use hit rates (HIT): percentage of speech-dominated time-frequency bins correctly classified by binary mask (Healy 2013, 2015), (Monaghan 2017) Use false alarms (FA): percentage of noise-dominated time-frequency bins incorrectly classified as speech-dominated (Healy 2013, 2015), (Monaghan 2017) Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
  • 25.
    25/35 Introduction Traditional techniques Sparse Coding DeepNeural Network Evaluation Conclusion Deep Neural Network Input Deep Neural Network Output Deep Neural Network Framework Deep Neural Network—Framework Figure: DNN-based speech enhancement. NN_COMP is a DNN trained on traditional feature. NN_AIM is a DNN trained on AIM Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
  • 26.
    26/35 Introduction Traditional techniques Sparse Coding DeepNeural Network Evaluation Conclusion Objective evaluation Subjective evaluation HIT-FA Figure: HIT-FA scores for neural network based algorithms. To calculate HIT-FA scores, the ratio masks (estimated and ideal) were converted to binary masks Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
  • 27.
    27/35 Introduction Traditional techniques Sparse Coding DeepNeural Network Evaluation Conclusion Objective evaluation Subjective evaluation Normalized Covariance Metric Figure: Mean values of NCM for the sentences used in each condition. Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
  • 28.
    28/35 Introduction Traditional techniques Sparse Coding DeepNeural Network Evaluation Conclusion Objective evaluation Subjective evaluation Short Time Objective Intelligibility Figure: Mean values of STOI for the sentences used in each condition. Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
  • 29.
    29/35 Introduction Traditional techniques Sparse Coding DeepNeural Network Evaluation Conclusion Objective evaluation Subjective evaluation Speech intelligibility Use speech recognition scores as an subjective measurement. Assessed as percentage of keywords identified correctly in sentences spoken by a British male 17 participants are native speakers of British English 2 kinds of noise: speech-shaped noise (SSN) and babble noise. 2 SNR conditions: 0 dB and +4 dB Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
  • 30.
    Subjective Intelligibility Figure: Percentageof key words correctly recognized for each algorithm in speech-shaped noise (SSN) and babble noise of 5 systems 30/35
  • 31.
    31/35 Introduction Traditional techniques Sparse Coding DeepNeural Network Evaluation Conclusion Objective evaluation Subjective evaluation Speech quality Same participants from speech intelligibility test. Ask participant to rate the perceived quality of the speech on a scale from 0 to 7. (0 is bad, and 7 is excellent) Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en
  • 32.
    Speech quality Figure: speechquality rating 32/35
  • 33.
    Speech intelligibility gainvs speech quality gain Figure: Group-mean improvement in speech quality versus improvement in speech intelligibility for the four algorithms in each noise condition. 33/35
  • 34.
    Correlation between Objectiveand Subjective measurement Figure: Average values of the objective measures NCM and STOI plotted as a function of the mean intelligibility scores 34/35
  • 35.
    35/35 Introduction Traditional techniques Sparse Coding DeepNeural Network Evaluation Conclusion Conclusion Significant improvement of speech recognition score and quality for 3 machine learning algorithms in at least one of four conditions No significant improvement for Wiener filtering For DNNs, auditory features perform better than traditional features, but not significant DNNs perform better than sparse coding Tuan Anh Dinh A literature review on improving speech intelligibility in noisy en