1) The document proposes a modification to the Time-Delay Neural Network (TDNN) structure called the Recurrent Time-Delay Neural Network (RTDNN) that adds feedback between the first hidden layer neurons.
2) An experiment classifying unvoiced plosive phonemes in Italian words using a RTDNN showed a 21% decrease in error rate compared to a TDNN, ranging from 5% for the TDNN to 3.6% for the RTDNN.
3) A statistical analysis confirmed the RTDNN improvements were significant, with a 27% relative decrease in error rate compared to the TDNN.
Speaker and Speech Recognition for Secured Smart Home ApplicationsRoger Gomes
The paper published in discusses implementation of a robust text-independent speaker recognition system using MFCC extraction of feature vectors its matching using VQ and optimization using LBG, further a text dependent speech recognition system using the DTW algorithm's implementation is discussed in the context of home automation.
Speaker and Speech Recognition for Secured Smart Home ApplicationsRoger Gomes
The paper published in discusses implementation of a robust text-independent speaker recognition system using MFCC extraction of feature vectors its matching using VQ and optimization using LBG, further a text dependent speech recognition system using the DTW algorithm's implementation is discussed in the context of home automation.
Packets Wavelets and Stockwell Transform Analysis of Femoral Doppler Ultrasou...IJECEIAES
Ultrasonic Doppler signals are widely used in the detection of cardiovascular pathologies or the evaluation of the degree of stenosis in the femoral arteries. The presence of stenosis can be indicated by disturbing the blood flow in the femoral arteries, causing spectral broadening of the Doppler signal. To analyze these types of signals and determine stenosis index, a number of time-frequency methods have been developed, such as the short-time Fourier transform, the continuous wavelets transform, the wavelet packet transform, and the S-transform
The discrete Fourier transform has many applications in science and engineering. For example, it is often used in digital signal processing applications such as voice recognition and image processing.
Image encryption technique incorporating wavelet transform and hash integrityeSAT Journals
Abstract
This paper is basically designed for image encryption using wavelet Transform Techniques and its integrity incorporating hash value with SHA-256. Techniques which is involved in encryption is image confusion, image diffusion, wavelet Transform, Inverse wavelet Transform and finally hash value computation of original image. Techniques which are involved for Decryption is reverse of Encryption.
Keywords: wavelet Transform, Hash value, Encryption, Decryption.
Recurrent Neural Network
ACRRL
Applied Control & Robotics Research Laboratory of Shiraz University
Department of Power and Control Engineering, Shiraz University, Fars, Iran.
Mohammad Sabouri
https://sites.google.com/view/acrrl/
Design and Implementation of Low Ripple Low Power Digital Phase-Locked LoopCSCJournals
We propose a phase-locked loop (PLL) architecture, which reduces the double frequency ripple without increasing the order of loop filter. Proposed architecture uses quadrature numerically–controlled oscillator (NCO) to provide two output signals with phase difference of π/2. One of them is subtracted from the input signal before multiplying with the other output of NCO. The system also provides stability in case the input signal has noise in amplitude or phase. The proposed structure is implemented using field programmable gate array (FPGA), which dissipates 15.44mw and works at clock frequency of 155.8 MHz.
Packets Wavelets and Stockwell Transform Analysis of Femoral Doppler Ultrasou...IJECEIAES
Ultrasonic Doppler signals are widely used in the detection of cardiovascular pathologies or the evaluation of the degree of stenosis in the femoral arteries. The presence of stenosis can be indicated by disturbing the blood flow in the femoral arteries, causing spectral broadening of the Doppler signal. To analyze these types of signals and determine stenosis index, a number of time-frequency methods have been developed, such as the short-time Fourier transform, the continuous wavelets transform, the wavelet packet transform, and the S-transform
The discrete Fourier transform has many applications in science and engineering. For example, it is often used in digital signal processing applications such as voice recognition and image processing.
Image encryption technique incorporating wavelet transform and hash integrityeSAT Journals
Abstract
This paper is basically designed for image encryption using wavelet Transform Techniques and its integrity incorporating hash value with SHA-256. Techniques which is involved in encryption is image confusion, image diffusion, wavelet Transform, Inverse wavelet Transform and finally hash value computation of original image. Techniques which are involved for Decryption is reverse of Encryption.
Keywords: wavelet Transform, Hash value, Encryption, Decryption.
Recurrent Neural Network
ACRRL
Applied Control & Robotics Research Laboratory of Shiraz University
Department of Power and Control Engineering, Shiraz University, Fars, Iran.
Mohammad Sabouri
https://sites.google.com/view/acrrl/
Design and Implementation of Low Ripple Low Power Digital Phase-Locked LoopCSCJournals
We propose a phase-locked loop (PLL) architecture, which reduces the double frequency ripple without increasing the order of loop filter. Proposed architecture uses quadrature numerically–controlled oscillator (NCO) to provide two output signals with phase difference of π/2. One of them is subtracted from the input signal before multiplying with the other output of NCO. The system also provides stability in case the input signal has noise in amplitude or phase. The proposed structure is implemented using field programmable gate array (FPGA), which dissipates 15.44mw and works at clock frequency of 155.8 MHz.
Frequency domain behavior of S-parameters piecewise-linear fitting in a digit...Piero Belforte
This paper describes PWLFIT+, an extension to the frequency domain ofPWLFIT, a new paradigm in time-domain macromodel ing for linear multiportsystems, based on a piecewise-linea r (PWL) behavioral representation of the S-parameters step response.
Neural Networks for High Performance Time-Delay Estimation and Acoustic Sourc...cscpconf
Time-delay estimation is an essential building block of many signal processing applications.
This paper follows up on earlier work for acoustic source localization and time delay estimation
using pattern recognition techniques in the adverse environment such as reverberant rooms or
underwater; it presents unprecedented high performance results obtained with supervised
training of neural networks which challenge the state of the art and compares its performance
to that of well-known methods such as the Generalized Cross-Correlation or Adaptive
Eigenvalue Decomposition.
NEURAL NETWORKS FOR HIGH PERFORMANCE TIME-DELAY ESTIMATION AND ACOUSTIC SOURC...csandit
Time-delay estimation is an essential building block of many signal processing applications.This paper follows up on earlier work for acoustic source localization and time delay estimation
using pattern recognition techniques in the adverse environment such as reverberant rooms or underwater; it presents unprecedented high performance results obtained with supervised training of neural networks which challenge the state of the art and compares its performance to that of well-known methods such as the Generalized Cross-Correlation or Adaptive Eigenvalue Decomposition.
In this paper, we present the implemented denoising section in the coding strategy of cochlear
implants, the technique used is the technique of wavelet bionic BWT (Bionic Wavelet
Transform). We have implemented the algorithm for denoising Raise the speech signal by the
hybrid method BWT in the FPGA (Field Programmable Gate Array), Xilinx (Virtex5
XC5VLX110T). In our study, we considered the following: at the beginning, we present how to
demonstrate features of this technique. We present an algorithm implementation we proposed,
we present simulation results and the performance of this technique in terms of improvement of
the SNR (Signal to Noise Ratio). The proposed implementations are realized in VHDL (Very
high speed integrated circuits Hardware Description Language). Different algorithms for
speech processing, including CIS (Continuous Interleaved Sampling) have been implemented
the strategy in this processor and tested successfully.
IMPLEMENTATION OF THE DEVELOPMENT OF A FILTERING ALGORITHM TO IMPROVE THE SYS...csandit
In this paper, we present the implemented denoising section in the coding strategy of cochlear
implants, the technique used is the technique of wavelet bionic BWT (Bionic Wavelet
Transform). We have implemented the algorithm for denoising Raise the speech signal by the
hybrid method BWT in the FPGA (Field Programmable Gate Array), Xilinx (Virtex5
XC5VLX110T). In our study, we considered the following: at the beginning, we present how to
demonstrate features of this technique. We present an algorithm implementation we proposed,
we present simulation results and the performance of this technique in terms of improvement of
the SNR (Signal to Noise Ratio). The proposed implementations are realized in VHDL (Very
high speed integrated circuits Hardware Description Language). Different algorithms for
speech processing, including CIS (Continuous Interleaved Sampling) have been implemented
the strategy in this processor and tested successfully.
We propose an algorithm for training Multi Layer Preceptrons for classification problems, that we named Hidden Layer Learning Vector Quantization (H-LVQ). It consists of applying Learning Vector Quantization to the last hidden layer of a MLP and it gave very successful results on problems containing a large number of correlated inputs. It was applied with excellent results on classification of Rurtherford
backscattering spectra and on a benchmark problem of image recognition. It may also be used for efficient feature extraction.
Multi-carrier Equalization by Restoration of RedundancY (MERRY) for Adaptive ...IJNSA Journal
This paper proposes a new blind adaptive channel shortening approach for multi-carrier systems. The performance of the discrete Fourier transform-DMT (DFT-DMT) system is investigated with the proposed DST-DMT system over the standard carrier serving area (CSA) loop1. Enhanced bit rates demonstrated and less complexity also involved by the simulation of the DST-DMT system.
Speech is the vocalizer form of human communication,and based upon the syntactic combination of lexical and vocabularies. The aim of speech coding is to compress the speech signal to the highest possible compression ratio bu t maintaining user acceptability.There are many methods for speech compression like Linear Pre dictive coding (LPC),Code Excited Linear Predictive coding (CELP),Sub-band coding,T ransform coding:- Fast Fourier Transform (FFT),Discrete Cosine Transform (DCT),Continuous Wavelet Transform (CWT),Discrete Wavelet Transform (DWT),Variance Fractal Compression (VFC),Discrete Cosine Transform (DCT),Psychoacoustics and etc. Few of them are discus in this paper.
Internal multiple attenuation using inverse scattering: Results from prestack 1 & 2D acoustic and
elastic synthetics
R. T. Coates*, Schlumberger Cambridge Research, A. B. Weglein, Arco Exploration and Production Technology
Summary
The attenuation of internal multiples in a multidimensional
earth is an important and longstanding problem in exploration
seismics. In this paper we report the results of applying
an attenuation algorithm based on the inverse scattering
series to synthetic prestack data sets generated in on
and two dimensional earth models. The attenuation algorithm
requires no information about the subsurface structure
or the velocity field. However, detailed information about
the source wavelet is a prerequisite. An attractive feature of:
the attenuation algorithm is the preservation of the amplitude
(and phase) of primary events in the data; thus allowing for
subsequent AVO and other true amplitude processing.
Internal multiple attenuation using inverse scattering: Results from prestack...Arthur Weglein
The attenuation of internal multiples in a multidimensional
earth is an important and longstanding problem in exploration
seismics. In this paper we report the results of applying
an attenuation algorithm based on the inverse scattering
series to synthetic prestack data sets generated in on
and two dimensional earth models. The attenuation algorithm
requires no information about the subsurface structure
or the velocity field. However, detailed information about
the source wavelet is a prerequisite. An attractive feature of:
the attenuation algorithm is the preservation of the amplitude
(and phase) of primary events in the data; thus allowing for
subsequent AVO and other true amplitude processing.
A Low Latency Implementation of a Non Uniform Partitioned Overlap and Save Al...a3labdsp
FIR convolution is a widely used operation in digital signal processing field, especially for filtering operations in real time scenarios. In this context, low computationally demanding techniques for calculating convolutions with low input/output latency become essential, considering that the real time requirements are strictly related to the impulse response length. In this paper, a multithreading real time implementation of a Non Uniform Partitioned Overlap and Save algorithm is proposed with the aim of lowering the workload required in applications like reverberation, also exploiting the human ear sensitivity. Several results are reported in order to show the effectiveness of the proposed approach in terms of computational cost, taking into consideration different impulse responses and also introducing comparisons with existing techniques of the state of the art.
The Application of Wavelet Neural Network in the Settlement Monitoring of SubwayTELKOMNIKA JOURNAL
The settlement monitoring of subway runs through the entire construction stage of subway. It is very important to predict the accurate settlement value for construction safety of subway. In this paper, the wavelet transform is used to denoise the settlement data. The auxiliary wavelet neural network, embedded wavelet neural network and single BP neural network are applied to predict the settlement of Tianjin subway. Compared with single BP neural network and auxiliary wavelet neural network, the embedded wavelet neural network model has a higher accuracy and better prediction effect. The embedded wavelet neural network is more valuable than the BP neural network model so it can be used in the prediction of subway settlement.
Spectral Analysis of Sample Rate ConverterCSCJournals
The aim of digital sample rate conversion is to bring a digital audio signal from one sample frequency to another. The distortion of the audio signal introduced by the sample rate converter should be as low as possible. The generation of the output samples from the input samples may be performed by the application of various methods. In this paper, a new technique of digital sample-rate converter is proposed. We perform the spectral analysis of proposed digital sample rate converter.
1. s2.11
A RECURRENTTIME-DELAYNEURALNETWORKFOR
IMPROVEDPHONEMERECOGNITION
Fabio GRECO, Andrea PAOLONI, Giacomo RAVAIOLJ
Fondazione Ugo Bordoni, Via Baldassarre Castiglione, 59 - 00142 Roma
ABSTRACT
In this work we propose a modification to the well-known
structure of Time-Delay Neural Network, obtained through a
feedback at the first-hidden layer level.
The experiment carried out with the new model called
RTDNN (Recurrent Time-Delay Neural Network) consists in the
classification of the unvoiced plosive phonemes. These ones
were extracted from an initial and intermediate position in a list
of most common Italian words, uttered by a male speaker, thus
obtaining 250 tokens per phoneme. The training was carried out
through a modified variant of Back Propagation, known as BPS
(Back Propagation for Sequences), using half of the tokens for
learning and the remaining for the test. The error rate trend thus
obtained shows a 21% decrease. in a uarticular range of “0”
(magnitude of feedback), with values ranging from 5% for the
original TDNN model with no feedback, to 3.6% for our
RTDNN model.
INTRODUCTION
A neural network implemented for phoneme recognition has
to take into account the sequential nature of speech. Unlike static
pattern classification the net must be able to exploit the
information coming from the temporal dimension and decisions
must be more linked to the way in which patterns evolve in time
rather than to their shape at particular instants.
A solution adopted by many scientists is the extension of the
input window considering a network activated simultaneously
by a large portion of speech. In this way they get round the
difficulty of explicitely treating the time and they return back to
static patterns classification.
Nevertheless, the question of temporal window extension is
quite troublesome and experiments up to now are based on
euristic assumptions [l, 21.
Initially, it could seem that the window extension should be
wide enough to contain contextual information as much as
possible. There are, however, at least three fundamental
problems connected to the use of a wide signal window.
The &Q problem is that contextual information is not
equally distributed, since the frames closer to the classifying
ones are more important than the ones further away. The net
should then exhibit a “forgetting effect” in order to exclude too
old information and give more importance to the closer frames.
The difficulty coming from the use of a wide signal window lies
in the fact that it is not known how to weight the window in
order to assure this effect.
The & proble.tR is connected to local speech rate
variations. Though with a small window such variations have a
negligible effectrwith wider windows considerable difficulties
come up. In fact, in this case, learning is a very difficult problem
in order to take into account such variations and this leads to a
worsening in performance.
The m urobla is that a large window require a net with
a large number of weights in the input layer and consequently a
large amount of training time and data are necessary.
Time-Delay Neural Networks (TDNNs) [3] avoid the last
two problems with a network activated (in the input layer) by
signal windows of only 30 ms. The use of contextual
information based on a wider window (150 ms) occurs in the
upper network layer, where the extraction of low-level acoustical
features has already been achieved.
So there is the advantage of separating the acoustical
features extration (sensitive to local signal distorsion) from the
integration of those features along time dimension. In the upper
network layer the possible effects of temporal variations can be
better handled as most information is already codified, by means
of this separation.
RTDNN
TDNNs, as we said, take into account a context of 150 ms
but low-level signal analysis is limited to a window of 30 ms.
Instead, it would be useful that internal representation performed
by first-hidden layer neurons is influenced by a wider portion of
speech, in order to take into account to a greater extent the
context in the extraction of acoustical features. The solution to
this problem obtained merely by extending the 30 ms signal
window is not effective for the above reasons.
Our solution allows to take into account a wider context,
avoiding the problems related to the window increase. The
method consists in using a high-level memory, generated
through a feedback between the codified information resulting
from the first-hidden layer neuron outputs at time “t” and the
input of the same at time “t+l” (Fig. 1). This modification, in
the spatially-extended representation of the new model which we
called RTDNN (Recurrent Time-Delay Neural Network), can be
seen as a sequence of connections acting horizontally within the
first-hidden layer between a frame of eight neurons and the
following one (Fig.2). The feedback weights are the same in the
copies of time-shifted connections, thereby assuring the
important property of time invariance of the original TDNN
model.
The network thus developed has the advantage of coping
better with the sequential nature of speech; in this way the latter
is considered as a sequence of events rather than a succession of
static patterns without temporal ordering.
Unlike TDNN our model, in the acoustic features extraction,
can explicitely exploit information hidden in the temporal
structure of speech. This is particulary important for phonemes
like plosives or nasals which exhibit a clear sequential structure,
The feedback action, in considering the contribution of
closer frames is qualitatively different from that coming from the
analysis window extension.
-81-
(382977-719110000-0081$1.00 e 1991 IEEE
2. 7 Wi.0
Fig. 1: The RTDNN first-hidden layer neuron
P TK
15 frames (150 ms)
Fig.2: The RTDNN architecture
As can be seen in Fig.2, the information is the one coming
from the coding performed by the first hidden neurons and not
directely from the signal. This information is not limited to a
finite and prefixed time interval (detetmined by the topology of
the connections versus input) but extends backward in time,
with an extension determined in an adaptive way by feedback
weight magnitude.
This consideration raises the problem of stability; the
contribution of past frames must in fact decrease on going
backward in time, but this condition is assured only by some
values of feedback weights. Let’s see why:
refering to Fig. 2, let’s call x(t) the vector constructed as a
connection of three input vectors of temporal coordinate t, t+l,
t+2. Calling ai the activation of neuron “i” of the first hidden
layer (value preceding the sigmoid function) and initially
assuming an activation feedback, we can write:
ai = Wiia;(t-1) + x,wil x,(t) + b, with ai(0) = 0 (1)
where wii is the feedback weight. By induction we get:
T
a,(T) = c
w;-, P,(t) with Pi(t) = Ciwtj x,(t) + b, (2)
kl
In order to estimate the influence of the signal window at
time “t - 7” to neuron ‘7” at time “t”, let’s define the following
parameter:
6(t,z) = aaP = w’5
f3pi(t-T) l1
(3)
From this we can see that, in order to ensure the desired
stability or “forgetting effect”, feedback weights must be
costrained in the range (-l,+l).
On observing the formula (2), we can notice that the neuron
transfer function with activation feedback is similar to that of an
IIR (Infinite Impulse Response) filter and is as follows :
(4)
which is an equality for “T” infinite. The “a” coefficient is thus
related to feedback weight wii of RTDNN neuron.
The transfer function of the TDNN neuron instead is similar
to that of a FIR (Finite Impulse Response) filter, because it is
affected the contribution of a finite number “D” of frames. This
transfer function is of the following kind:
D
c -i
tli 2 (5)
id
where now the various ai are different and each of them is
related to the connection matrix between the TDNN neuron and
the input frame at time “t - i”.
In the case of output feedback, formula (1) became:
a;(t) = wii f(ai(t-1)) + cjWij x,(t) + bi (6)
where f(x) is the sigmoid function and it is easily proven that in
this case:
- 82 -
3. Iw.1 7
I 6(t,z) I I +
( >
If -4 2 wii < +4, the effect of past frames will be modulated
by decreasing terms, thereby ensuring the desired effect.
As you can see in (6) output feedback is a non-linear one,
because-of the sigmoid‘functibn, and the correspondence with
IIR filters is not so immediate: the RTDNN neuron with outuut
feedback will be then described as a kind of non-linear IIR filier.
Because of the greater general nature of this model, we
adopted it in our experiment with RTDNN.
THELEARNINGALGORITHM
The introduction of feedback in the TDNN structure creates
some trouble in the learning algorithm.
The algorithm used for the learning of RTDNN follows an
approach similar to the one proposed by Gori [4] and Kuhn [5].
The method, known as BPS (Back-Propagation for Sequences),
allows to avoid the backward path in time until the initial point,
during the back-propagation stage, as proposed by Rumelhart
with the “time-unfolding” tecnique. This is implemented at the
price of some additional variables computed in the feed-forward
stage and propagated forward in time.
The bond on the feedback weights -4 < wii < +4 (that is
necessary to ensure the model stability) has been implemented
by introducing some control variables kii and defining:
Wii = p tanh (kii ) with 0 I p < +4 (8)
The kii variables which are unlimited are therefore the ones
that are modified by the learning algorithm. The p parameter
introduced in (8) allows to continuously vary the feedback
amplitude inside the range of values: 0 s p 5 +4.
By putting p = 0, the described algorithm became the usual
Back-Propagation algotithm.and the net RTDNN is converted
into the Waibel’s TDNN. .
EXPERIMENTSSET UP
The aim for carrying out experiments was to compare
performances of the new model with the ones of TDNNs. It was
also of interest to measure, for the Italian language, the
classification cauabilities of both models. To make this
comparison we decided to treat the classification of plosive
phonemes in our experiment with RTDNNs. Really experiments
with these phonemes are very accurately described in other
papers about TDNNs. In particular, we choose the unvoiced
plosives phonemes (/p/, It/, /k/), which are more easy to
segment.
The network we used in our experiment is the one depicted
in Fig. 2, with three output nodes and feedback connections.
“Database”
In order to prepare the training and test data-base, we have
chosen 105 Italian words among the 10.000 most common ones
which contain the plosives both in initial and intermediate
position trying to represent as much as possible the different
phonetic contexts.
In order to increase the data-base representativeness, we
prepared another list containing the same words preceded by an
article, a verb or preposition. This second list aims at
introducing inter-words coarticulation events, besides intra-word
ones in the phonemes.
The 210 words have been repeated three times by a non-
professional adult male speaker at intervals of about one month
between each repetition. So the tokens final number resulted as
follows: 254 Jpf, 288 ItI e 25OJk/, from which we used an
evenly distributed number of tokens: 250 per class.
The words were recorded in a quiet laboratory environement
with an average S/N ratio of 38 dB. Numerical signal conversion
has been achieved in agreement with the standard defined in the
European project SAM (Speech Assessment Methodologies), by
using the OROS card on a PC and the acquisition program
EUROPEC. Signal has been sampled at 16 bits with a sampling
frequency of 20 kHz.
The 792 plosive tokens have been hand-segmented to center
the 150 ms input window over the voiced onset which follows
the burst. A 512-point FFT has been computed every 5 ms
inside the window. from which we obtained 16 Barkscale
coefficients over a 46 Hz - 6 kHz range. Coefficients adjacent in
time were averaged, yielding an overall frame rate of 10 msec;
the coefficients were then normalized between -1 and +1 with the
average at 0.0.
Networksimulation
In order to reduce computation time, we decided to use an
array-processor FPS M64/35, with 12 MFLOPS peak
performance [6]. The measured speed of the implemented
simulation algorithm (related to the learning stage), gave a value
of 0.79 MCPS (Million of Connections modified Per Second).
Learning has been stopped when the total error on the
complete set of patterns comes below a threshold of 0.1. To
reach this situation an average number of 100 epochs was
required.
EXPERIMENTALRESULTS
ComparisonbetweenRTDNNand TDNN
In the experiment of classification the whole phoneme set
available of 256 tokens per phoneme has been divided by two,
using one half for learning and the remaining one for the test.
Each experiment was implemented with initial weights
chosen randomly in the [-l,+l] range and, by fixing initial
weights, the range of allowed Beta values (the amplitude of
feedback) has been explored between 0.0 and 4.0 with step 0.2.
Fig. 3 shows the results of 10 experiments of this type, each
one obtained with a different random set up of initial weigths.
In coincidence with Beta=0 (TDNN case), we have obtained an
average error rate of 5%. a value comparable to the one obtained
by Waibel(2.3%) with a different data-base [3].
The RTDNN net (corresponding to upper Beta values)
instead shows, in the error rate trend, a considerable reduction in
relation to the TDNN case. Improvement is particulary marked
for Beta values above 2.
Differences in the error rate trend in various experiments,
depending on the choice of initial weigths, lead us to carry out a
statistical test in order to stress real systematic variations.
- 83 -
4. 53
50
475
490
3J
390
04 LO 290 390 490
BETA
Fig. 3: Averaged error rate vs. BETA (feedback amplitude).
The vertical segments are confidence intervals (&10).
From “t - test” of Student the result for Beta values above 2
is as follows: the hypothesis that the mean error was
considerablv smaller than the TDNN case is accented with a
percentage df false rejection of 5%.
1
The statistical test thus confirms the significativeness of
improvements particulary for Beta = 2.8. Corresponding to this
value, the final experiment result shows a relative error rate
decrease of 27% in relation to the original TDNN model.
Shift influence
In order to test the TDNNs time-invariance features and to
uy and see whether this is true also for RTDNNs, we carried out
a second experiment.
In this case the 15 spectral vectors that form the learning and
test patterns are randomly shifted to a maximum amount of * 40
ms. In order to achieve a significant comparison with the case of
“no shift” we used for the learning and test the same patterns in
the same order of the previous experiment.
50
4,s
490
3s
390
295
2,o
090 l,O 2,o 330 490
BETA
Fig. 4: Error rate vs. BETA obtained with 40 ms of
maximum temporal shift
Fig. 4 shows the error rate trend versus Beta, obtained in a
single experiment. In that case the values have been
approximated with a parabola obtained with the least square
method.
The experiment proves the time-invariance proprerty of
TDNNs (deducible in Fir-r.4 from the value cot-resuondinz to.~~~
Beta =0) and shows that”also with shifted patterns RTDNNs
continue to be at an advantage in relation to TDNNs suggesting
an opimal Beta value near 2 in view of error reduction,
CONCLUSIONS
This paper proposed a neural network model based on the
use of a feedback in the TDNN structure. Experiments on
classification of segments of speech related to plosive phonemes
show that RTDNN model improves recognition rate, confirming
the superior ability of recurrent networks to treat the sequential
nature of speech.
Future work will be devoted to extend the test to the whole
corpus of Italian phonemes and to try the effect of spontaneous
speech.
ACKNOWLEDGMENT
The autors would like to thank Dr. Alex Waibel for the
encouragement provided to submit this paper to ICASSP and
Mr. Berardo Savetione for labeling the data base.
References
[II
PI
[31
141
[51
[61
r71
PI
PI
Bourlard, H., Wellekens, C. “Speech pattern
discrimination and multilayer perceptrons”, Computer
Speech andLanguage, J,Q&J, pp. 1-19, 1989
Elman, J.L., Zipser, D. “Learning the Hidden Structure of
Speech” KS Report 8701, Institute for Cognitive Science,
University of California, San Diego, CA, 1987
Waibel, A., Hanazawa, T., Hinton, G., Shikano, K.,
Lang, K. “Phoneme Recognition Using Time-Delay
Neural Networks”, IEEE Transactions on Acoustic,
Speech and Signal Processing, Yol. 31, No. 3, March
1989
Gori, M., Bengio, Y., De Mori, R. “BPS: A Learning
Algorithm for Capturing the Dynamic Nature of Speech”,
Proceedings of the IEEE-IJCNN 89, Washington, 1989
Kuhn, G., Watrous, R.L., Ladendotf, B. “Connected
Recognition with a Recurrent Network”, Speech
Communication , w, pp.41-48, 1990
Corana, A., Rolando, C., Ridella, S. “Neural Network
Simulation with High Performances on FPS M64 Series
Minisupercomputers”, FPS Users European Conference,
Stratfotd-upon- Avon, 25-26 April 1989
Bourlard, H., & Wellekens, C.J. “Speech Dynamics and
+&rent Neural Networks”, ICASSP 1989, yell, pp.
Greco, F., Ravaioli, G., “An Experiment on Phoneme
Classification Through a Time-Delay Neural Network”,
Proc. of the “3’ Workshop Italian0 su Architetture Parallele
e Reti Neuronali “, Vietri sul Mare, Salerno, Italy, 15-19
May 1990, World Scientific Publishing
Greco, F. “Realizzazione di un modello neuronale per la
decodifica acustico-fonetica de1 parlato continua”
Graduation Thesis in Phisics, University of Rome,
November 1990
- 04 -