SlideShare a Scribd company logo
s2.11
A RECURRENTTIME-DELAYNEURALNETWORKFOR
IMPROVEDPHONEMERECOGNITION
Fabio GRECO, Andrea PAOLONI, Giacomo RAVAIOLJ
Fondazione Ugo Bordoni, Via Baldassarre Castiglione, 59 - 00142 Roma
ABSTRACT
In this work we propose a modification to the well-known
structure of Time-Delay Neural Network, obtained through a
feedback at the first-hidden layer level.
The experiment carried out with the new model called
RTDNN (Recurrent Time-Delay Neural Network) consists in the
classification of the unvoiced plosive phonemes. These ones
were extracted from an initial and intermediate position in a list
of most common Italian words, uttered by a male speaker, thus
obtaining 250 tokens per phoneme. The training was carried out
through a modified variant of Back Propagation, known as BPS
(Back Propagation for Sequences), using half of the tokens for
learning and the remaining for the test. The error rate trend thus
obtained shows a 21% decrease. in a uarticular range of “0”
(magnitude of feedback), with values ranging from 5% for the
original TDNN model with no feedback, to 3.6% for our
RTDNN model.
INTRODUCTION
A neural network implemented for phoneme recognition has
to take into account the sequential nature of speech. Unlike static
pattern classification the net must be able to exploit the
information coming from the temporal dimension and decisions
must be more linked to the way in which patterns evolve in time
rather than to their shape at particular instants.
A solution adopted by many scientists is the extension of the
input window considering a network activated simultaneously
by a large portion of speech. In this way they get round the
difficulty of explicitely treating the time and they return back to
static patterns classification.
Nevertheless, the question of temporal window extension is
quite troublesome and experiments up to now are based on
euristic assumptions [l, 21.
Initially, it could seem that the window extension should be
wide enough to contain contextual information as much as
possible. There are, however, at least three fundamental
problems connected to the use of a wide signal window.
The &Q problem is that contextual information is not
equally distributed, since the frames closer to the classifying
ones are more important than the ones further away. The net
should then exhibit a “forgetting effect” in order to exclude too
old information and give more importance to the closer frames.
The difficulty coming from the use of a wide signal window lies
in the fact that it is not known how to weight the window in
order to assure this effect.
The & proble.tR is connected to local speech rate
variations. Though with a small window such variations have a
negligible effectrwith wider windows considerable difficulties
come up. In fact, in this case, learning is a very difficult problem
in order to take into account such variations and this leads to a
worsening in performance.
The m urobla is that a large window require a net with
a large number of weights in the input layer and consequently a
large amount of training time and data are necessary.
Time-Delay Neural Networks (TDNNs) [3] avoid the last
two problems with a network activated (in the input layer) by
signal windows of only 30 ms. The use of contextual
information based on a wider window (150 ms) occurs in the
upper network layer, where the extraction of low-level acoustical
features has already been achieved.
So there is the advantage of separating the acoustical
features extration (sensitive to local signal distorsion) from the
integration of those features along time dimension. In the upper
network layer the possible effects of temporal variations can be
better handled as most information is already codified, by means
of this separation.
RTDNN
TDNNs, as we said, take into account a context of 150 ms
but low-level signal analysis is limited to a window of 30 ms.
Instead, it would be useful that internal representation performed
by first-hidden layer neurons is influenced by a wider portion of
speech, in order to take into account to a greater extent the
context in the extraction of acoustical features. The solution to
this problem obtained merely by extending the 30 ms signal
window is not effective for the above reasons.
Our solution allows to take into account a wider context,
avoiding the problems related to the window increase. The
method consists in using a high-level memory, generated
through a feedback between the codified information resulting
from the first-hidden layer neuron outputs at time “t” and the
input of the same at time “t+l” (Fig. 1). This modification, in
the spatially-extended representation of the new model which we
called RTDNN (Recurrent Time-Delay Neural Network), can be
seen as a sequence of connections acting horizontally within the
first-hidden layer between a frame of eight neurons and the
following one (Fig.2). The feedback weights are the same in the
copies of time-shifted connections, thereby assuring the
important property of time invariance of the original TDNN
model.
The network thus developed has the advantage of coping
better with the sequential nature of speech; in this way the latter
is considered as a sequence of events rather than a succession of
static patterns without temporal ordering.
Unlike TDNN our model, in the acoustic features extraction,
can explicitely exploit information hidden in the temporal
structure of speech. This is particulary important for phonemes
like plosives or nasals which exhibit a clear sequential structure,
The feedback action, in considering the contribution of
closer frames is qualitatively different from that coming from the
analysis window extension.
-81-
(382977-719110000-0081$1.00 e 1991 IEEE
7 Wi.0
Fig. 1: The RTDNN first-hidden layer neuron
P TK
15 frames (150 ms)
Fig.2: The RTDNN architecture
As can be seen in Fig.2, the information is the one coming
from the coding performed by the first hidden neurons and not
directely from the signal. This information is not limited to a
finite and prefixed time interval (detetmined by the topology of
the connections versus input) but extends backward in time,
with an extension determined in an adaptive way by feedback
weight magnitude.
This consideration raises the problem of stability; the
contribution of past frames must in fact decrease on going
backward in time, but this condition is assured only by some
values of feedback weights. Let’s see why:
refering to Fig. 2, let’s call x(t) the vector constructed as a
connection of three input vectors of temporal coordinate t, t+l,
t+2. Calling ai the activation of neuron “i” of the first hidden
layer (value preceding the sigmoid function) and initially
assuming an activation feedback, we can write:
ai = Wiia;(t-1) + x,wil x,(t) + b, with ai(0) = 0 (1)
where wii is the feedback weight. By induction we get:
T
a,(T) = c
w;-, P,(t) with Pi(t) = Ciwtj x,(t) + b, (2)
kl
In order to estimate the influence of the signal window at
time “t - 7” to neuron ‘7” at time “t”, let’s define the following
parameter:
6(t,z) = aaP = w’5
f3pi(t-T) l1
(3)
From this we can see that, in order to ensure the desired
stability or “forgetting effect”, feedback weights must be
costrained in the range (-l,+l).
On observing the formula (2), we can notice that the neuron
transfer function with activation feedback is similar to that of an
IIR (Infinite Impulse Response) filter and is as follows :
(4)
which is an equality for “T” infinite. The “a” coefficient is thus
related to feedback weight wii of RTDNN neuron.
The transfer function of the TDNN neuron instead is similar
to that of a FIR (Finite Impulse Response) filter, because it is
affected the contribution of a finite number “D” of frames. This
transfer function is of the following kind:
D
c -i
tli 2 (5)
id
where now the various ai are different and each of them is
related to the connection matrix between the TDNN neuron and
the input frame at time “t - i”.
In the case of output feedback, formula (1) became:
a;(t) = wii f(ai(t-1)) + cjWij x,(t) + bi (6)
where f(x) is the sigmoid function and it is easily proven that in
this case:
- 82 -
Iw.1 7
I 6(t,z) I I +
( >
If -4 2 wii < +4, the effect of past frames will be modulated
by decreasing terms, thereby ensuring the desired effect.
As you can see in (6) output feedback is a non-linear one,
because-of the sigmoid‘functibn, and the correspondence with
IIR filters is not so immediate: the RTDNN neuron with outuut
feedback will be then described as a kind of non-linear IIR filier.
Because of the greater general nature of this model, we
adopted it in our experiment with RTDNN.
THELEARNINGALGORITHM
The introduction of feedback in the TDNN structure creates
some trouble in the learning algorithm.
The algorithm used for the learning of RTDNN follows an
approach similar to the one proposed by Gori [4] and Kuhn [5].
The method, known as BPS (Back-Propagation for Sequences),
allows to avoid the backward path in time until the initial point,
during the back-propagation stage, as proposed by Rumelhart
with the “time-unfolding” tecnique. This is implemented at the
price of some additional variables computed in the feed-forward
stage and propagated forward in time.
The bond on the feedback weights -4 < wii < +4 (that is
necessary to ensure the model stability) has been implemented
by introducing some control variables kii and defining:
Wii = p tanh (kii ) with 0 I p < +4 (8)
The kii variables which are unlimited are therefore the ones
that are modified by the learning algorithm. The p parameter
introduced in (8) allows to continuously vary the feedback
amplitude inside the range of values: 0 s p 5 +4.
By putting p = 0, the described algorithm became the usual
Back-Propagation algotithm.and the net RTDNN is converted
into the Waibel’s TDNN. .
EXPERIMENTSSET UP
The aim for carrying out experiments was to compare
performances of the new model with the ones of TDNNs. It was
also of interest to measure, for the Italian language, the
classification cauabilities of both models. To make this
comparison we decided to treat the classification of plosive
phonemes in our experiment with RTDNNs. Really experiments
with these phonemes are very accurately described in other
papers about TDNNs. In particular, we choose the unvoiced
plosives phonemes (/p/, It/, /k/), which are more easy to
segment.
The network we used in our experiment is the one depicted
in Fig. 2, with three output nodes and feedback connections.
“Database”
In order to prepare the training and test data-base, we have
chosen 105 Italian words among the 10.000 most common ones
which contain the plosives both in initial and intermediate
position trying to represent as much as possible the different
phonetic contexts.
In order to increase the data-base representativeness, we
prepared another list containing the same words preceded by an
article, a verb or preposition. This second list aims at
introducing inter-words coarticulation events, besides intra-word
ones in the phonemes.
The 210 words have been repeated three times by a non-
professional adult male speaker at intervals of about one month
between each repetition. So the tokens final number resulted as
follows: 254 Jpf, 288 ItI e 25OJk/, from which we used an
evenly distributed number of tokens: 250 per class.
The words were recorded in a quiet laboratory environement
with an average S/N ratio of 38 dB. Numerical signal conversion
has been achieved in agreement with the standard defined in the
European project SAM (Speech Assessment Methodologies), by
using the OROS card on a PC and the acquisition program
EUROPEC. Signal has been sampled at 16 bits with a sampling
frequency of 20 kHz.
The 792 plosive tokens have been hand-segmented to center
the 150 ms input window over the voiced onset which follows
the burst. A 512-point FFT has been computed every 5 ms
inside the window. from which we obtained 16 Barkscale
coefficients over a 46 Hz - 6 kHz range. Coefficients adjacent in
time were averaged, yielding an overall frame rate of 10 msec;
the coefficients were then normalized between -1 and +1 with the
average at 0.0.
Networksimulation
In order to reduce computation time, we decided to use an
array-processor FPS M64/35, with 12 MFLOPS peak
performance [6]. The measured speed of the implemented
simulation algorithm (related to the learning stage), gave a value
of 0.79 MCPS (Million of Connections modified Per Second).
Learning has been stopped when the total error on the
complete set of patterns comes below a threshold of 0.1. To
reach this situation an average number of 100 epochs was
required.
EXPERIMENTALRESULTS
ComparisonbetweenRTDNNand TDNN
In the experiment of classification the whole phoneme set
available of 256 tokens per phoneme has been divided by two,
using one half for learning and the remaining one for the test.
Each experiment was implemented with initial weights
chosen randomly in the [-l,+l] range and, by fixing initial
weights, the range of allowed Beta values (the amplitude of
feedback) has been explored between 0.0 and 4.0 with step 0.2.
Fig. 3 shows the results of 10 experiments of this type, each
one obtained with a different random set up of initial weigths.
In coincidence with Beta=0 (TDNN case), we have obtained an
average error rate of 5%. a value comparable to the one obtained
by Waibel(2.3%) with a different data-base [3].
The RTDNN net (corresponding to upper Beta values)
instead shows, in the error rate trend, a considerable reduction in
relation to the TDNN case. Improvement is particulary marked
for Beta values above 2.
Differences in the error rate trend in various experiments,
depending on the choice of initial weigths, lead us to carry out a
statistical test in order to stress real systematic variations.
- 83 -
53
50
475
490
3J
390
04 LO 290 390 490
BETA
Fig. 3: Averaged error rate vs. BETA (feedback amplitude).
The vertical segments are confidence intervals (&10).
From “t - test” of Student the result for Beta values above 2
is as follows: the hypothesis that the mean error was
considerablv smaller than the TDNN case is accented with a
percentage df false rejection of 5%.
1
The statistical test thus confirms the significativeness of
improvements particulary for Beta = 2.8. Corresponding to this
value, the final experiment result shows a relative error rate
decrease of 27% in relation to the original TDNN model.
Shift influence
In order to test the TDNNs time-invariance features and to
uy and see whether this is true also for RTDNNs, we carried out
a second experiment.
In this case the 15 spectral vectors that form the learning and
test patterns are randomly shifted to a maximum amount of * 40
ms. In order to achieve a significant comparison with the case of
“no shift” we used for the learning and test the same patterns in
the same order of the previous experiment.
50
4,s
490
3s
390
295
2,o
090 l,O 2,o 330 490
BETA
Fig. 4: Error rate vs. BETA obtained with 40 ms of
maximum temporal shift
Fig. 4 shows the error rate trend versus Beta, obtained in a
single experiment. In that case the values have been
approximated with a parabola obtained with the least square
method.
The experiment proves the time-invariance proprerty of
TDNNs (deducible in Fir-r.4 from the value cot-resuondinz to.~~~
Beta =0) and shows that”also with shifted patterns RTDNNs
continue to be at an advantage in relation to TDNNs suggesting
an opimal Beta value near 2 in view of error reduction,
CONCLUSIONS
This paper proposed a neural network model based on the
use of a feedback in the TDNN structure. Experiments on
classification of segments of speech related to plosive phonemes
show that RTDNN model improves recognition rate, confirming
the superior ability of recurrent networks to treat the sequential
nature of speech.
Future work will be devoted to extend the test to the whole
corpus of Italian phonemes and to try the effect of spontaneous
speech.
ACKNOWLEDGMENT
The autors would like to thank Dr. Alex Waibel for the
encouragement provided to submit this paper to ICASSP and
Mr. Berardo Savetione for labeling the data base.
References
[II
PI
[31
141
[51
[61
r71
PI
PI
Bourlard, H., Wellekens, C. “Speech pattern
discrimination and multilayer perceptrons”, Computer
Speech andLanguage, J,Q&J, pp. 1-19, 1989
Elman, J.L., Zipser, D. “Learning the Hidden Structure of
Speech” KS Report 8701, Institute for Cognitive Science,
University of California, San Diego, CA, 1987
Waibel, A., Hanazawa, T., Hinton, G., Shikano, K.,
Lang, K. “Phoneme Recognition Using Time-Delay
Neural Networks”, IEEE Transactions on Acoustic,
Speech and Signal Processing, Yol. 31, No. 3, March
1989
Gori, M., Bengio, Y., De Mori, R. “BPS: A Learning
Algorithm for Capturing the Dynamic Nature of Speech”,
Proceedings of the IEEE-IJCNN 89, Washington, 1989
Kuhn, G., Watrous, R.L., Ladendotf, B. “Connected
Recognition with a Recurrent Network”, Speech
Communication , w, pp.41-48, 1990
Corana, A., Rolando, C., Ridella, S. “Neural Network
Simulation with High Performances on FPS M64 Series
Minisupercomputers”, FPS Users European Conference,
Stratfotd-upon- Avon, 25-26 April 1989
Bourlard, H., & Wellekens, C.J. “Speech Dynamics and
+&rent Neural Networks”, ICASSP 1989, yell, pp.
Greco, F., Ravaioli, G., “An Experiment on Phoneme
Classification Through a Time-Delay Neural Network”,
Proc. of the “3’ Workshop Italian0 su Architetture Parallele
e Reti Neuronali “, Vietri sul Mare, Salerno, Italy, 15-19
May 1990, World Scientific Publishing
Greco, F. “Realizzazione di un modello neuronale per la
decodifica acustico-fonetica de1 parlato continua”
Graduation Thesis in Phisics, University of Rome,
November 1990
- 04 -

More Related Content

What's hot

Ch6 digital transmission of analog signal pg 99
Ch6 digital transmission of analog signal pg 99Ch6 digital transmission of analog signal pg 99
Ch6 digital transmission of analog signal pg 99
Prateek Omer
 
Temporal Hypermap Theory and Application
Temporal Hypermap Theory and ApplicationTemporal Hypermap Theory and Application
Temporal Hypermap Theory and Application
Abel Nyamapfene
 
Exponential-plus-Constant Fitting based on Fourier Analysis
Exponential-plus-Constant Fitting based on Fourier AnalysisExponential-plus-Constant Fitting based on Fourier Analysis
Exponential-plus-Constant Fitting based on Fourier AnalysisMatthieu Hodgkinson
 
Ecg signal compression for diverse transforms
Ecg signal compression for diverse transformsEcg signal compression for diverse transforms
Ecg signal compression for diverse transforms
Alexander Decker
 
Packets Wavelets and Stockwell Transform Analysis of Femoral Doppler Ultrasou...
Packets Wavelets and Stockwell Transform Analysis of Femoral Doppler Ultrasou...Packets Wavelets and Stockwell Transform Analysis of Femoral Doppler Ultrasou...
Packets Wavelets and Stockwell Transform Analysis of Femoral Doppler Ultrasou...
IJECEIAES
 
Chapter7 circuits
Chapter7 circuitsChapter7 circuits
Chapter7 circuitsVin Voro
 
Dft,fft,windowing
Dft,fft,windowingDft,fft,windowing
Dft,fft,windowing
Abhishek Verma
 
The Fast Fourier Transform (FFT)
The Fast Fourier Transform (FFT)The Fast Fourier Transform (FFT)
The Fast Fourier Transform (FFT)
Oka Danil
 
Sampling theorem
Sampling theoremSampling theorem
Sampling theorem
Shanu Bhuvana
 
Kanal wireless dan propagasi
Kanal wireless dan propagasiKanal wireless dan propagasi
Kanal wireless dan propagasi
Mochamad Guntur Hady Putra
 
Melcomplexity Escom 20090729
Melcomplexity Escom 20090729Melcomplexity Escom 20090729
Melcomplexity Escom 20090729
Klaus Frieler
 
Ee463 communications 2 - lab 1 - loren schwappach
Ee463   communications 2 - lab 1 - loren schwappachEe463   communications 2 - lab 1 - loren schwappach
Ee463 communications 2 - lab 1 - loren schwappachLoren Schwappach
 
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)
Amr E. Mohamed
 
Model checking of time petri nets
Model checking of time petri netsModel checking of time petri nets
Model checking of time petri netsMarwa Al-Rikaby
 
Image encryption technique incorporating wavelet transform and hash integrity
Image encryption technique incorporating wavelet transform and hash integrityImage encryption technique incorporating wavelet transform and hash integrity
Image encryption technique incorporating wavelet transform and hash integrity
eSAT Journals
 
Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
Mohammad Sabouri
 
Design and Implementation of Low Ripple Low Power Digital Phase-Locked Loop
Design and Implementation of Low Ripple Low Power Digital Phase-Locked LoopDesign and Implementation of Low Ripple Low Power Digital Phase-Locked Loop
Design and Implementation of Low Ripple Low Power Digital Phase-Locked Loop
CSCJournals
 
Complexity analysis of multilayer perceptron neural network embedded into a w...
Complexity analysis of multilayer perceptron neural network embedded into a w...Complexity analysis of multilayer perceptron neural network embedded into a w...
Complexity analysis of multilayer perceptron neural network embedded into a w...
Amir Shokri
 
Environmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniqueEnvironmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniquePankaj Kumar
 

What's hot (20)

Ch6 digital transmission of analog signal pg 99
Ch6 digital transmission of analog signal pg 99Ch6 digital transmission of analog signal pg 99
Ch6 digital transmission of analog signal pg 99
 
Temporal Hypermap Theory and Application
Temporal Hypermap Theory and ApplicationTemporal Hypermap Theory and Application
Temporal Hypermap Theory and Application
 
Exponential-plus-Constant Fitting based on Fourier Analysis
Exponential-plus-Constant Fitting based on Fourier AnalysisExponential-plus-Constant Fitting based on Fourier Analysis
Exponential-plus-Constant Fitting based on Fourier Analysis
 
45
4545
45
 
Ecg signal compression for diverse transforms
Ecg signal compression for diverse transformsEcg signal compression for diverse transforms
Ecg signal compression for diverse transforms
 
Packets Wavelets and Stockwell Transform Analysis of Femoral Doppler Ultrasou...
Packets Wavelets and Stockwell Transform Analysis of Femoral Doppler Ultrasou...Packets Wavelets and Stockwell Transform Analysis of Femoral Doppler Ultrasou...
Packets Wavelets and Stockwell Transform Analysis of Femoral Doppler Ultrasou...
 
Chapter7 circuits
Chapter7 circuitsChapter7 circuits
Chapter7 circuits
 
Dft,fft,windowing
Dft,fft,windowingDft,fft,windowing
Dft,fft,windowing
 
The Fast Fourier Transform (FFT)
The Fast Fourier Transform (FFT)The Fast Fourier Transform (FFT)
The Fast Fourier Transform (FFT)
 
Sampling theorem
Sampling theoremSampling theorem
Sampling theorem
 
Kanal wireless dan propagasi
Kanal wireless dan propagasiKanal wireless dan propagasi
Kanal wireless dan propagasi
 
Melcomplexity Escom 20090729
Melcomplexity Escom 20090729Melcomplexity Escom 20090729
Melcomplexity Escom 20090729
 
Ee463 communications 2 - lab 1 - loren schwappach
Ee463   communications 2 - lab 1 - loren schwappachEe463   communications 2 - lab 1 - loren schwappach
Ee463 communications 2 - lab 1 - loren schwappach
 
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)
DSP_FOEHU - MATLAB 04 - The Discrete Fourier Transform (DFT)
 
Model checking of time petri nets
Model checking of time petri netsModel checking of time petri nets
Model checking of time petri nets
 
Image encryption technique incorporating wavelet transform and hash integrity
Image encryption technique incorporating wavelet transform and hash integrityImage encryption technique incorporating wavelet transform and hash integrity
Image encryption technique incorporating wavelet transform and hash integrity
 
Recurrent Neural Network
Recurrent Neural NetworkRecurrent Neural Network
Recurrent Neural Network
 
Design and Implementation of Low Ripple Low Power Digital Phase-Locked Loop
Design and Implementation of Low Ripple Low Power Digital Phase-Locked LoopDesign and Implementation of Low Ripple Low Power Digital Phase-Locked Loop
Design and Implementation of Low Ripple Low Power Digital Phase-Locked Loop
 
Complexity analysis of multilayer perceptron neural network embedded into a w...
Complexity analysis of multilayer perceptron neural network embedded into a w...Complexity analysis of multilayer perceptron neural network embedded into a w...
Complexity analysis of multilayer perceptron neural network embedded into a w...
 
Environmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniqueEnvironmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC technique
 

Viewers also liked

2012.05.18 - DEMOCRACIA E JUSTIÇA - samba no pé e direito na cabeça democra...
2012.05.18 - DEMOCRACIA E JUSTIÇA - samba no pé e direito na cabeça   democra...2012.05.18 - DEMOCRACIA E JUSTIÇA - samba no pé e direito na cabeça   democra...
2012.05.18 - DEMOCRACIA E JUSTIÇA - samba no pé e direito na cabeça democra...
CarMela Grüne
 
Amber Neal Care Giver Resume
Amber Neal Care Giver ResumeAmber Neal Care Giver Resume
Amber Neal Care Giver ResumeAmber Neal
 
Time delay neural network for continuous emotional dimension prediction from ...
Time delay neural network for continuous emotional dimension prediction from ...Time delay neural network for continuous emotional dimension prediction from ...
Time delay neural network for continuous emotional dimension prediction from ...
I3E Technologies
 
Practica #3 laboratorio
Practica #3 laboratorioPractica #3 laboratorio
Practica #3 laboratorio
angel BJ
 
Gadnr001.01 norma uso, rep. mant
Gadnr001.01 norma uso, rep. mantGadnr001.01 norma uso, rep. mant
Gadnr001.01 norma uso, rep. mant
Essap S.A.
 
Amritsar Inter City Bus Terminal PPP Structure and Risk mitigation matrix
Amritsar Inter City Bus Terminal PPP Structure and Risk mitigation matrixAmritsar Inter City Bus Terminal PPP Structure and Risk mitigation matrix
Amritsar Inter City Bus Terminal PPP Structure and Risk mitigation matrixNISHCHAY SAXENA
 

Viewers also liked (6)

2012.05.18 - DEMOCRACIA E JUSTIÇA - samba no pé e direito na cabeça democra...
2012.05.18 - DEMOCRACIA E JUSTIÇA - samba no pé e direito na cabeça   democra...2012.05.18 - DEMOCRACIA E JUSTIÇA - samba no pé e direito na cabeça   democra...
2012.05.18 - DEMOCRACIA E JUSTIÇA - samba no pé e direito na cabeça democra...
 
Amber Neal Care Giver Resume
Amber Neal Care Giver ResumeAmber Neal Care Giver Resume
Amber Neal Care Giver Resume
 
Time delay neural network for continuous emotional dimension prediction from ...
Time delay neural network for continuous emotional dimension prediction from ...Time delay neural network for continuous emotional dimension prediction from ...
Time delay neural network for continuous emotional dimension prediction from ...
 
Practica #3 laboratorio
Practica #3 laboratorioPractica #3 laboratorio
Practica #3 laboratorio
 
Gadnr001.01 norma uso, rep. mant
Gadnr001.01 norma uso, rep. mantGadnr001.01 norma uso, rep. mant
Gadnr001.01 norma uso, rep. mant
 
Amritsar Inter City Bus Terminal PPP Structure and Risk mitigation matrix
Amritsar Inter City Bus Terminal PPP Structure and Risk mitigation matrixAmritsar Inter City Bus Terminal PPP Structure and Risk mitigation matrix
Amritsar Inter City Bus Terminal PPP Structure and Risk mitigation matrix
 

Similar to Recurrent Time-Delay Neural Network - Fabio Greco - ICASSP 91

Concepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, AttentionConcepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, Attention
SaumyaMundra3
 
Frequency domain behavior of S-parameters piecewise-linear fitting in a digit...
Frequency domain behavior of S-parameters piecewise-linear fitting in a digit...Frequency domain behavior of S-parameters piecewise-linear fitting in a digit...
Frequency domain behavior of S-parameters piecewise-linear fitting in a digit...
Piero Belforte
 
Neural Networks for High Performance Time-Delay Estimation and Acoustic Sourc...
Neural Networks for High Performance Time-Delay Estimation and Acoustic Sourc...Neural Networks for High Performance Time-Delay Estimation and Acoustic Sourc...
Neural Networks for High Performance Time-Delay Estimation and Acoustic Sourc...
cscpconf
 
NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...
NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...
NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...
csandit
 
Implementation of the
Implementation of theImplementation of the
Implementation of the
csandit
 
IMPLEMENTATION OF THE DEVELOPMENT OF A FILTERING ALGORITHM TO IMPROVE THE SYS...
IMPLEMENTATION OF THE DEVELOPMENT OF A FILTERING ALGORITHM TO IMPROVE THE SYS...IMPLEMENTATION OF THE DEVELOPMENT OF A FILTERING ALGORITHM TO IMPROVE THE SYS...
IMPLEMENTATION OF THE DEVELOPMENT OF A FILTERING ALGORITHM TO IMPROVE THE SYS...
csandit
 
Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio
Armando Vieira
 
Multi-carrier Equalization by Restoration of RedundancY (MERRY) for Adaptive ...
Multi-carrier Equalization by Restoration of RedundancY (MERRY) for Adaptive ...Multi-carrier Equalization by Restoration of RedundancY (MERRY) for Adaptive ...
Multi-carrier Equalization by Restoration of RedundancY (MERRY) for Adaptive ...
IJNSA Journal
 
SPEECH COMPRESSION TECHNIQUES: A REVIEW
SPEECH COMPRESSION TECHNIQUES: A REVIEWSPEECH COMPRESSION TECHNIQUES: A REVIEW
SPEECH COMPRESSION TECHNIQUES: A REVIEW
ijiert bestjournal
 
08039246
0803924608039246
08039246
Thilip Kumar
 
Coates weglein-1996
Coates weglein-1996Coates weglein-1996
Coates weglein-1996
Arthur Weglein
 
Internal multiple attenuation using inverse scattering: Results from prestack...
Internal multiple attenuation using inverse scattering: Results from prestack...Internal multiple attenuation using inverse scattering: Results from prestack...
Internal multiple attenuation using inverse scattering: Results from prestack...
Arthur Weglein
 
A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...
A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...
A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...
a3labdsp
 
236628934.pdf
236628934.pdf236628934.pdf
236628934.pdf
KishoreAjayKumarAyya
 
The Application of Wavelet Neural Network in the Settlement Monitoring of Subway
The Application of Wavelet Neural Network in the Settlement Monitoring of SubwayThe Application of Wavelet Neural Network in the Settlement Monitoring of Subway
The Application of Wavelet Neural Network in the Settlement Monitoring of Subway
TELKOMNIKA JOURNAL
 
Constraints on the gluon PDF from top quark differential distributions at NNLO
Constraints on the gluon PDF from top quark differential distributions at NNLOConstraints on the gluon PDF from top quark differential distributions at NNLO
Constraints on the gluon PDF from top quark differential distributions at NNLOjuanrojochacon
 
I017535359
I017535359I017535359
I017535359
IOSR Journals
 
Spectral Analysis of Sample Rate Converter
Spectral Analysis of Sample Rate ConverterSpectral Analysis of Sample Rate Converter
Spectral Analysis of Sample Rate Converter
CSCJournals
 
Chaos in Small-World Networks
Chaos in Small-World NetworksChaos in Small-World Networks
Chaos in Small-World Networks
Xin-She Yang
 

Similar to Recurrent Time-Delay Neural Network - Fabio Greco - ICASSP 91 (20)

Concepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, AttentionConcepts of Temporal CNN, Recurrent Neural Network, Attention
Concepts of Temporal CNN, Recurrent Neural Network, Attention
 
Frequency domain behavior of S-parameters piecewise-linear fitting in a digit...
Frequency domain behavior of S-parameters piecewise-linear fitting in a digit...Frequency domain behavior of S-parameters piecewise-linear fitting in a digit...
Frequency domain behavior of S-parameters piecewise-linear fitting in a digit...
 
Neural Networks for High Performance Time-Delay Estimation and Acoustic Sourc...
Neural Networks for High Performance Time-Delay Estimation and Acoustic Sourc...Neural Networks for High Performance Time-Delay Estimation and Acoustic Sourc...
Neural Networks for High Performance Time-Delay Estimation and Acoustic Sourc...
 
NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...
NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...
NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...
 
Implementation of the
Implementation of theImplementation of the
Implementation of the
 
IMPLEMENTATION OF THE DEVELOPMENT OF A FILTERING ALGORITHM TO IMPROVE THE SYS...
IMPLEMENTATION OF THE DEVELOPMENT OF A FILTERING ALGORITHM TO IMPROVE THE SYS...IMPLEMENTATION OF THE DEVELOPMENT OF A FILTERING ALGORITHM TO IMPROVE THE SYS...
IMPLEMENTATION OF THE DEVELOPMENT OF A FILTERING ALGORITHM TO IMPROVE THE SYS...
 
Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio
 
Multi-carrier Equalization by Restoration of RedundancY (MERRY) for Adaptive ...
Multi-carrier Equalization by Restoration of RedundancY (MERRY) for Adaptive ...Multi-carrier Equalization by Restoration of RedundancY (MERRY) for Adaptive ...
Multi-carrier Equalization by Restoration of RedundancY (MERRY) for Adaptive ...
 
SPEECH COMPRESSION TECHNIQUES: A REVIEW
SPEECH COMPRESSION TECHNIQUES: A REVIEWSPEECH COMPRESSION TECHNIQUES: A REVIEW
SPEECH COMPRESSION TECHNIQUES: A REVIEW
 
08039246
0803924608039246
08039246
 
Coates weglein-1996
Coates weglein-1996Coates weglein-1996
Coates weglein-1996
 
Internal multiple attenuation using inverse scattering: Results from prestack...
Internal multiple attenuation using inverse scattering: Results from prestack...Internal multiple attenuation using inverse scattering: Results from prestack...
Internal multiple attenuation using inverse scattering: Results from prestack...
 
A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...
A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...
A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...
 
Final document
Final documentFinal document
Final document
 
236628934.pdf
236628934.pdf236628934.pdf
236628934.pdf
 
The Application of Wavelet Neural Network in the Settlement Monitoring of Subway
The Application of Wavelet Neural Network in the Settlement Monitoring of SubwayThe Application of Wavelet Neural Network in the Settlement Monitoring of Subway
The Application of Wavelet Neural Network in the Settlement Monitoring of Subway
 
Constraints on the gluon PDF from top quark differential distributions at NNLO
Constraints on the gluon PDF from top quark differential distributions at NNLOConstraints on the gluon PDF from top quark differential distributions at NNLO
Constraints on the gluon PDF from top quark differential distributions at NNLO
 
I017535359
I017535359I017535359
I017535359
 
Spectral Analysis of Sample Rate Converter
Spectral Analysis of Sample Rate ConverterSpectral Analysis of Sample Rate Converter
Spectral Analysis of Sample Rate Converter
 
Chaos in Small-World Networks
Chaos in Small-World NetworksChaos in Small-World Networks
Chaos in Small-World Networks
 

Recurrent Time-Delay Neural Network - Fabio Greco - ICASSP 91

  • 1. s2.11 A RECURRENTTIME-DELAYNEURALNETWORKFOR IMPROVEDPHONEMERECOGNITION Fabio GRECO, Andrea PAOLONI, Giacomo RAVAIOLJ Fondazione Ugo Bordoni, Via Baldassarre Castiglione, 59 - 00142 Roma ABSTRACT In this work we propose a modification to the well-known structure of Time-Delay Neural Network, obtained through a feedback at the first-hidden layer level. The experiment carried out with the new model called RTDNN (Recurrent Time-Delay Neural Network) consists in the classification of the unvoiced plosive phonemes. These ones were extracted from an initial and intermediate position in a list of most common Italian words, uttered by a male speaker, thus obtaining 250 tokens per phoneme. The training was carried out through a modified variant of Back Propagation, known as BPS (Back Propagation for Sequences), using half of the tokens for learning and the remaining for the test. The error rate trend thus obtained shows a 21% decrease. in a uarticular range of “0” (magnitude of feedback), with values ranging from 5% for the original TDNN model with no feedback, to 3.6% for our RTDNN model. INTRODUCTION A neural network implemented for phoneme recognition has to take into account the sequential nature of speech. Unlike static pattern classification the net must be able to exploit the information coming from the temporal dimension and decisions must be more linked to the way in which patterns evolve in time rather than to their shape at particular instants. A solution adopted by many scientists is the extension of the input window considering a network activated simultaneously by a large portion of speech. In this way they get round the difficulty of explicitely treating the time and they return back to static patterns classification. Nevertheless, the question of temporal window extension is quite troublesome and experiments up to now are based on euristic assumptions [l, 21. Initially, it could seem that the window extension should be wide enough to contain contextual information as much as possible. There are, however, at least three fundamental problems connected to the use of a wide signal window. The &Q problem is that contextual information is not equally distributed, since the frames closer to the classifying ones are more important than the ones further away. The net should then exhibit a “forgetting effect” in order to exclude too old information and give more importance to the closer frames. The difficulty coming from the use of a wide signal window lies in the fact that it is not known how to weight the window in order to assure this effect. The & proble.tR is connected to local speech rate variations. Though with a small window such variations have a negligible effectrwith wider windows considerable difficulties come up. In fact, in this case, learning is a very difficult problem in order to take into account such variations and this leads to a worsening in performance. The m urobla is that a large window require a net with a large number of weights in the input layer and consequently a large amount of training time and data are necessary. Time-Delay Neural Networks (TDNNs) [3] avoid the last two problems with a network activated (in the input layer) by signal windows of only 30 ms. The use of contextual information based on a wider window (150 ms) occurs in the upper network layer, where the extraction of low-level acoustical features has already been achieved. So there is the advantage of separating the acoustical features extration (sensitive to local signal distorsion) from the integration of those features along time dimension. In the upper network layer the possible effects of temporal variations can be better handled as most information is already codified, by means of this separation. RTDNN TDNNs, as we said, take into account a context of 150 ms but low-level signal analysis is limited to a window of 30 ms. Instead, it would be useful that internal representation performed by first-hidden layer neurons is influenced by a wider portion of speech, in order to take into account to a greater extent the context in the extraction of acoustical features. The solution to this problem obtained merely by extending the 30 ms signal window is not effective for the above reasons. Our solution allows to take into account a wider context, avoiding the problems related to the window increase. The method consists in using a high-level memory, generated through a feedback between the codified information resulting from the first-hidden layer neuron outputs at time “t” and the input of the same at time “t+l” (Fig. 1). This modification, in the spatially-extended representation of the new model which we called RTDNN (Recurrent Time-Delay Neural Network), can be seen as a sequence of connections acting horizontally within the first-hidden layer between a frame of eight neurons and the following one (Fig.2). The feedback weights are the same in the copies of time-shifted connections, thereby assuring the important property of time invariance of the original TDNN model. The network thus developed has the advantage of coping better with the sequential nature of speech; in this way the latter is considered as a sequence of events rather than a succession of static patterns without temporal ordering. Unlike TDNN our model, in the acoustic features extraction, can explicitely exploit information hidden in the temporal structure of speech. This is particulary important for phonemes like plosives or nasals which exhibit a clear sequential structure, The feedback action, in considering the contribution of closer frames is qualitatively different from that coming from the analysis window extension. -81- (382977-719110000-0081$1.00 e 1991 IEEE
  • 2. 7 Wi.0 Fig. 1: The RTDNN first-hidden layer neuron P TK 15 frames (150 ms) Fig.2: The RTDNN architecture As can be seen in Fig.2, the information is the one coming from the coding performed by the first hidden neurons and not directely from the signal. This information is not limited to a finite and prefixed time interval (detetmined by the topology of the connections versus input) but extends backward in time, with an extension determined in an adaptive way by feedback weight magnitude. This consideration raises the problem of stability; the contribution of past frames must in fact decrease on going backward in time, but this condition is assured only by some values of feedback weights. Let’s see why: refering to Fig. 2, let’s call x(t) the vector constructed as a connection of three input vectors of temporal coordinate t, t+l, t+2. Calling ai the activation of neuron “i” of the first hidden layer (value preceding the sigmoid function) and initially assuming an activation feedback, we can write: ai = Wiia;(t-1) + x,wil x,(t) + b, with ai(0) = 0 (1) where wii is the feedback weight. By induction we get: T a,(T) = c w;-, P,(t) with Pi(t) = Ciwtj x,(t) + b, (2) kl In order to estimate the influence of the signal window at time “t - 7” to neuron ‘7” at time “t”, let’s define the following parameter: 6(t,z) = aaP = w’5 f3pi(t-T) l1 (3) From this we can see that, in order to ensure the desired stability or “forgetting effect”, feedback weights must be costrained in the range (-l,+l). On observing the formula (2), we can notice that the neuron transfer function with activation feedback is similar to that of an IIR (Infinite Impulse Response) filter and is as follows : (4) which is an equality for “T” infinite. The “a” coefficient is thus related to feedback weight wii of RTDNN neuron. The transfer function of the TDNN neuron instead is similar to that of a FIR (Finite Impulse Response) filter, because it is affected the contribution of a finite number “D” of frames. This transfer function is of the following kind: D c -i tli 2 (5) id where now the various ai are different and each of them is related to the connection matrix between the TDNN neuron and the input frame at time “t - i”. In the case of output feedback, formula (1) became: a;(t) = wii f(ai(t-1)) + cjWij x,(t) + bi (6) where f(x) is the sigmoid function and it is easily proven that in this case: - 82 -
  • 3. Iw.1 7 I 6(t,z) I I + ( > If -4 2 wii < +4, the effect of past frames will be modulated by decreasing terms, thereby ensuring the desired effect. As you can see in (6) output feedback is a non-linear one, because-of the sigmoid‘functibn, and the correspondence with IIR filters is not so immediate: the RTDNN neuron with outuut feedback will be then described as a kind of non-linear IIR filier. Because of the greater general nature of this model, we adopted it in our experiment with RTDNN. THELEARNINGALGORITHM The introduction of feedback in the TDNN structure creates some trouble in the learning algorithm. The algorithm used for the learning of RTDNN follows an approach similar to the one proposed by Gori [4] and Kuhn [5]. The method, known as BPS (Back-Propagation for Sequences), allows to avoid the backward path in time until the initial point, during the back-propagation stage, as proposed by Rumelhart with the “time-unfolding” tecnique. This is implemented at the price of some additional variables computed in the feed-forward stage and propagated forward in time. The bond on the feedback weights -4 < wii < +4 (that is necessary to ensure the model stability) has been implemented by introducing some control variables kii and defining: Wii = p tanh (kii ) with 0 I p < +4 (8) The kii variables which are unlimited are therefore the ones that are modified by the learning algorithm. The p parameter introduced in (8) allows to continuously vary the feedback amplitude inside the range of values: 0 s p 5 +4. By putting p = 0, the described algorithm became the usual Back-Propagation algotithm.and the net RTDNN is converted into the Waibel’s TDNN. . EXPERIMENTSSET UP The aim for carrying out experiments was to compare performances of the new model with the ones of TDNNs. It was also of interest to measure, for the Italian language, the classification cauabilities of both models. To make this comparison we decided to treat the classification of plosive phonemes in our experiment with RTDNNs. Really experiments with these phonemes are very accurately described in other papers about TDNNs. In particular, we choose the unvoiced plosives phonemes (/p/, It/, /k/), which are more easy to segment. The network we used in our experiment is the one depicted in Fig. 2, with three output nodes and feedback connections. “Database” In order to prepare the training and test data-base, we have chosen 105 Italian words among the 10.000 most common ones which contain the plosives both in initial and intermediate position trying to represent as much as possible the different phonetic contexts. In order to increase the data-base representativeness, we prepared another list containing the same words preceded by an article, a verb or preposition. This second list aims at introducing inter-words coarticulation events, besides intra-word ones in the phonemes. The 210 words have been repeated three times by a non- professional adult male speaker at intervals of about one month between each repetition. So the tokens final number resulted as follows: 254 Jpf, 288 ItI e 25OJk/, from which we used an evenly distributed number of tokens: 250 per class. The words were recorded in a quiet laboratory environement with an average S/N ratio of 38 dB. Numerical signal conversion has been achieved in agreement with the standard defined in the European project SAM (Speech Assessment Methodologies), by using the OROS card on a PC and the acquisition program EUROPEC. Signal has been sampled at 16 bits with a sampling frequency of 20 kHz. The 792 plosive tokens have been hand-segmented to center the 150 ms input window over the voiced onset which follows the burst. A 512-point FFT has been computed every 5 ms inside the window. from which we obtained 16 Barkscale coefficients over a 46 Hz - 6 kHz range. Coefficients adjacent in time were averaged, yielding an overall frame rate of 10 msec; the coefficients were then normalized between -1 and +1 with the average at 0.0. Networksimulation In order to reduce computation time, we decided to use an array-processor FPS M64/35, with 12 MFLOPS peak performance [6]. The measured speed of the implemented simulation algorithm (related to the learning stage), gave a value of 0.79 MCPS (Million of Connections modified Per Second). Learning has been stopped when the total error on the complete set of patterns comes below a threshold of 0.1. To reach this situation an average number of 100 epochs was required. EXPERIMENTALRESULTS ComparisonbetweenRTDNNand TDNN In the experiment of classification the whole phoneme set available of 256 tokens per phoneme has been divided by two, using one half for learning and the remaining one for the test. Each experiment was implemented with initial weights chosen randomly in the [-l,+l] range and, by fixing initial weights, the range of allowed Beta values (the amplitude of feedback) has been explored between 0.0 and 4.0 with step 0.2. Fig. 3 shows the results of 10 experiments of this type, each one obtained with a different random set up of initial weigths. In coincidence with Beta=0 (TDNN case), we have obtained an average error rate of 5%. a value comparable to the one obtained by Waibel(2.3%) with a different data-base [3]. The RTDNN net (corresponding to upper Beta values) instead shows, in the error rate trend, a considerable reduction in relation to the TDNN case. Improvement is particulary marked for Beta values above 2. Differences in the error rate trend in various experiments, depending on the choice of initial weigths, lead us to carry out a statistical test in order to stress real systematic variations. - 83 -
  • 4. 53 50 475 490 3J 390 04 LO 290 390 490 BETA Fig. 3: Averaged error rate vs. BETA (feedback amplitude). The vertical segments are confidence intervals (&10). From “t - test” of Student the result for Beta values above 2 is as follows: the hypothesis that the mean error was considerablv smaller than the TDNN case is accented with a percentage df false rejection of 5%. 1 The statistical test thus confirms the significativeness of improvements particulary for Beta = 2.8. Corresponding to this value, the final experiment result shows a relative error rate decrease of 27% in relation to the original TDNN model. Shift influence In order to test the TDNNs time-invariance features and to uy and see whether this is true also for RTDNNs, we carried out a second experiment. In this case the 15 spectral vectors that form the learning and test patterns are randomly shifted to a maximum amount of * 40 ms. In order to achieve a significant comparison with the case of “no shift” we used for the learning and test the same patterns in the same order of the previous experiment. 50 4,s 490 3s 390 295 2,o 090 l,O 2,o 330 490 BETA Fig. 4: Error rate vs. BETA obtained with 40 ms of maximum temporal shift Fig. 4 shows the error rate trend versus Beta, obtained in a single experiment. In that case the values have been approximated with a parabola obtained with the least square method. The experiment proves the time-invariance proprerty of TDNNs (deducible in Fir-r.4 from the value cot-resuondinz to.~~~ Beta =0) and shows that”also with shifted patterns RTDNNs continue to be at an advantage in relation to TDNNs suggesting an opimal Beta value near 2 in view of error reduction, CONCLUSIONS This paper proposed a neural network model based on the use of a feedback in the TDNN structure. Experiments on classification of segments of speech related to plosive phonemes show that RTDNN model improves recognition rate, confirming the superior ability of recurrent networks to treat the sequential nature of speech. Future work will be devoted to extend the test to the whole corpus of Italian phonemes and to try the effect of spontaneous speech. ACKNOWLEDGMENT The autors would like to thank Dr. Alex Waibel for the encouragement provided to submit this paper to ICASSP and Mr. Berardo Savetione for labeling the data base. References [II PI [31 141 [51 [61 r71 PI PI Bourlard, H., Wellekens, C. “Speech pattern discrimination and multilayer perceptrons”, Computer Speech andLanguage, J,Q&J, pp. 1-19, 1989 Elman, J.L., Zipser, D. “Learning the Hidden Structure of Speech” KS Report 8701, Institute for Cognitive Science, University of California, San Diego, CA, 1987 Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., Lang, K. “Phoneme Recognition Using Time-Delay Neural Networks”, IEEE Transactions on Acoustic, Speech and Signal Processing, Yol. 31, No. 3, March 1989 Gori, M., Bengio, Y., De Mori, R. “BPS: A Learning Algorithm for Capturing the Dynamic Nature of Speech”, Proceedings of the IEEE-IJCNN 89, Washington, 1989 Kuhn, G., Watrous, R.L., Ladendotf, B. “Connected Recognition with a Recurrent Network”, Speech Communication , w, pp.41-48, 1990 Corana, A., Rolando, C., Ridella, S. “Neural Network Simulation with High Performances on FPS M64 Series Minisupercomputers”, FPS Users European Conference, Stratfotd-upon- Avon, 25-26 April 1989 Bourlard, H., & Wellekens, C.J. “Speech Dynamics and +&rent Neural Networks”, ICASSP 1989, yell, pp. Greco, F., Ravaioli, G., “An Experiment on Phoneme Classification Through a Time-Delay Neural Network”, Proc. of the “3’ Workshop Italian0 su Architetture Parallele e Reti Neuronali “, Vietri sul Mare, Salerno, Italy, 15-19 May 1990, World Scientific Publishing Greco, F. “Realizzazione di un modello neuronale per la decodifica acustico-fonetica de1 parlato continua” Graduation Thesis in Phisics, University of Rome, November 1990 - 04 -