SlideShare a Scribd company logo
1 of 6
Download to read offline
Sukhdeep Kaur et al Int. Journal of Engineering Research and Applications
ISSN : 2248-9622, Vol. 3, Issue 6, Nov-Dec 2013, pp.749-754

RESEARCH ARTICLE

www.ijera.com

OPEN ACCESS

Enhancement of Speech Recognition Algorithm Using DCT and
Inverse Wave Transformation
Sukhdeep Kaur, Er. Gurwinder Kaur
Research Student, Assistant Professor
(Electronics and Communication Engineering Department Yadavindra College of Engineering (YCoE)
Talwandi Sabo, District Bathinda, Punjab, India.)
Electronics and Communication Engineering Department Yadavindra College of Engineering (YCoE),
Talwandi Sabo, District Bathinda, Punjab, India.)

ABSTRACT
This research work focus on providing better performance in speech recognition algorithm by integrating digital
signal transposition with speech recognition techniques. This is an approach for improving the performance of
speech recognition algorithm using butterworth stopband filter and discrete cosine transformation based speech
compression with inverse wave transformation. The main objective is to integrate filtering with speech
recognition algorithm to improve the results when noise is present in the signal. In this work, the matching is
done using inverse wave transformations which reduce the time for recognition of voices. Proposed algorithm is
designed and implemented in MATLAB. The proposed algorithm has been tested on the given samples and
evaluated using different recognizable and unrecognizable samples obtaining a recognition ratio about 98%. It is
shown that the proposed algorithm provides better results than existing techniques. Proposed algorithm increase
the accuracy of the speech recognition system.
Keywords: filter, inverse wave transformation, speech recognition, wave format.

I.

Introduction

The speech recognition is defined as the
process of considering the spoken word as an input
speech and matches it with the previously recorded
speeches on basis of various parameters. This can be
done by various methods. It is a process of
automatically recognizing who is speaking on the
basis of features of speaker of the speech signal.
Speech recognition features has some of the
advantages like speech input is easy to do because it
does not demand a specialized skill as does typing or
push button procedures. Information can be input
even when the user is not constant or doing other
activities including the hands, eyes, legs or ears.
Since a telephone or microphone can be used as an
input terminal [1].
Basically, speaker recognition is classified
in to speaker identification and speaker verification.
Wide application of speech recognition system
includes control access to services such as database
access services banking by telephone, voice dialing
telephone shopping. Now speech recognition
technology is the most desirable technology to create
new services.

II.

Classification of Speech Recognition
Systems

Most speech recognition systems can be
classified according to the following categories [2]:

www.ijera.com



Speaker
Dependent
versus
Speaker
Independent
A speaker‐dependent speech recognition
system is one that is trained to recognize the speech
of only one speaker. A speaker‐independent system
is one that is independence is difficult to attain, as
speech recognition organizations tend to become
adjusted to the speakers they are trained on, resulting
in error rates are higher than speaker dependent
system.


Isolated versus Continuous
In isolated speech the speaker pauses
shortly between every word, while in continuous
speech the speaker speaks in a continuous and
possibly long stream with little or no breaks in
between. Isolated speech recognition systems are
easy to build. Words spoken in continuous speech on
the other hand are subjected to the co-articulation
effect, in which the pronunciation of a word is
modified by the words surrounding it.


Keyword based versus Subword unit based
A speech recognition system can be trained
to recognize whole words, like dog or cat. This is
useful in applications like voice‐command‐systems,
in which the system need only recognize a small set
of words. This approach is simple but not scalable.
As the dictionary of recognized words grow, so too
the complexity and execution time of the recognizer.

749 | P a g e
Sukhdeep Kaur et al Int. Journal of Engineering Research and Applications
ISSN : 2248-9622, Vol. 3, Issue 6, Nov-Dec 2013, pp.749-754
III.

www.ijera.com

Approaches to Speech Recognition

Basically there exist three approaches to
speech recognition. They are [11]:


Acoustic Phonetic Approach
Acoustic-phonetic approach assumes that
the phonetic units are broadly characterized by a set
of
features
such
as
format
frequency,
voiced/unvoiced and pitch. These features are
extracted from the speech signal and are used to
segment and level the speech.


Pattern Recognition Approach
Pattern recognition approach requires no
explicit knowledge of speech. This approach has two
steps – namely, training of speech patterns based on
some generic spectral parameter set and recognition
of patterns via pattern comparison. The popular
pattern recognition techniques include template
matching, Hidden Markov Model


Artificial Intelligence Approach
Knowledge based approach attempts to
mechanize the recognition procedure according to
the way a person applies its intelligence in
visualizing, analyzing and finally making a decision
on the measured acoustic features. Expert system is
used widely in this approach.

IV.

Problem Definition

In problem definition define different
problems in existing approaches and how these
problems will be eliminated or reduced using
improved audio recognition algorithm.
 The methods developed so far fail or not give
efficient results when there exist noise in the
audio signal.
 Noises add too much disturbance on the given
signal and which decreases the audio
recognition algorithm accuracy rate.
 Noise leaves bad effects on the bandwidth of a
given network so it becomes major issue in
audio signals.
 Exiting networks such as neural networks take
much time in training period

V.

Proposed Work

In the proposed method, the goal is to detect
the speaker from the previously recorded wave
samples. The main concentration is on accuracy and
speed. The proposed method is implemented using
MatLab.

www.ijera.com

Fig.1. flow-chart for speech recognition algorithm
In the proposed algorithm samples of
dummy speeches are taken for experiments and a
database is prepared in which these speeches are
saved. The speech samples are taken in the wave
format. The input samples are taken from various
persons by recording their voices. These are the
tested voices TVi for the given system. In the testing
phase test voice is match with the stored input voice
and recognition decisions are made.
Butterworth filter is used to remove the
noise from the system. Speech signals degrade due to
the presence of background noise and noise
reduction is an important field of speech processing.
Butterworth stopband filter is used to minimize the
disturbance from the speech signals.
Discrete cosine transforms (DCT) based
speech compression is used to reduce the size of the
speech information. It is used to speed up the system
by remove the redundancy from audio information.
Compression is the process of elimination of
redundancy and duplicity. The DCT is very common
when encoding video and speech tracks on
computers. The DCT is very similar to the DFT but
the output values of DCT are real numbers and the
output vector is approximately twice as long as the
750 | P a g e
Sukhdeep Kaur et al Int. Journal of Engineering Research and Applications
ISSN : 2248-9622, Vol. 3, Issue 6, Nov-Dec 2013, pp.749-754
DFT output. It shows a sequence of finite data points
in terms of sum of cosine functions.
The 1-d discrete cosine transforms

k=1, 2,...N
(1)
Where

www.ijera.com

persons is taken and then tested with the previously
stored 10 certified recognized voices.
Table 1 shows the number of certified audio
samples that are saved according to which the inputs
can be matched. For the experimental setup, 10
samples of certified audio clips are taken. 100
samples of unrecognized inputs are taken which have
to be matched with certified audio samples. Again
100 samples of unknown person audio recordings are
taken.
Table No. 1. Set of Samples
Type

Sample Size

Certified

10

(2)

Input Recognized

100

N is the length of x, and x and y are the
same size. If x is a matrix, DCT transforms its
columns. The series is indexed from n = 1 and k = 1
instead of the usual n = 0 and k = 0 because
MATLAB vectors run from 1 to N instead of from 0
to N- 1. The transpose of each of the tested voices
TV is taken. The transpose is taken so that the
speech frames that are generally shown horizontally
could be converted in vertical form so that it can be
easy to understand and recognize the speech frames
easily.
Take the input sample of voice that has to
be tested. Read the wave input and take its
transpose. The input sample taken is in wave format
and it is tested among the database that contains
previously recorded voice sample.
Match the input voice with tested database
one by one and find correlation. Correlation
computes a measure of similarity of two input
signals as they are shifted by one another. The
correlation result reaches a maximum at the time
when the two signals match best. The correlation
value is denoted by M. The highest the value of
correlation, more the two voices are similar.
Find the maximum value of correlation and
let Correlation be M. If the value of M is greater than
the threshold then recognized else not found. If the
correlation is more than the threshold value then it
must be the same audio else the input audio is not
detected in the database. The threshold value is set
first. If the correlation value is greater than the
threshold value then voice input sample will be
called as recognized otherwise the tested voices will
not contain the input sample of voice.

Unknown Person Audio

100

The 200 samples in total are taken for
testing. For the samples are taken then hits are
calculated which gives the number of times the
correct recognition is done when the input is present
in the previously saved set of certified voices.
Recognition is done by matching the input samples
voices having wave format with the previously
certified set of samples. Similarly, number of miss
are evaluated accordingly voices when sample is not
found in the record of certified voices.

VI.

Results

The input audio signal waveform is shown
in figure 2. It represents an audio signal or recording.
The amplitude of the signal is measured on the yaxis and time is measured on the x-axis. This figure
shows that background disturbance is present in the
speech signals. The disturbance in signals degrades
the performance. The accuracy rate slightly
decreases as errors are introduced.
Input audio signal waveform

0.8
0.6
0.4

Amplitude

0.2
0
-0.2
-0.4
-0.6
-0.8

5.1 Database
The database consists of recorded voices
which are used for matching with the input samples.
This database consists of 100 different voices. These
are different images with different poses from
different type of sources.
Various samples are taken in which 100 set
of recognized persons and 100 set of unrecognized

www.ijera.com

-1

0.5

1

1.5

2

2.5
Time

3

3.5

4

4.5

5
5

x 10

Fig. 2. input audio signal waveform
The spectrogram generated for a wave input
is shown below in figure3. Spectrogram is Timedependent frequency analysis. In the spectrogram the
time axis is the horizontal axis and frequency is the
751 | P a g e
Sukhdeep Kaur et al Int. Journal of Engineering Research and Applications
ISSN : 2248-9622, Vol. 3, Issue 6, Nov-Dec 2013, pp.749-754
vertical axis. The input audio specgram shows that
background disturbance is present in the speech
signals and the accuracy rate slightly decreases.

www.ijera.com

Filtered audio specgram
8000
7000

Input audio specgram
6000

7000

5000

Frequency

8000

6000

3000

5000
Frequency

4000

2000

4000

1000

3000

0

2000

5

1000

10

15
Time

20

25

30

Fig. 5. filtered audio specgram

0
5

10

15
Time

20

25

30

Fig. 3. input audio specgram
The filtered audio signal waveform is
shown in figure 4. The background disturbance in
the signals degrades the performance and noise
reduction is an important field of speech processing.
The filtered audio signal waveform shows that
background noise was successfully removed from a
signal by using butterworth stopband filter. In this
matching is done using inverse wave transformations
which reduce the time for recognition of voices. The
filtered audio waveform shows that it has more
accuracy.

The compressed audio signal waveform is
shown in figure 6. DCT based audio compression is
used with butterworth stopband filter and inverse
wave transformation to remove the disturbance from
the speech signals and to speed up the process. DCT
based data compression is the process of encoding
information utilizing fewer bits than an un-encoded
representation would use through use of particular
encoding schemes. Background noise and
redundancy are successfully removed from a signal
by using Butterworth stopband filter and DCT. The
compressed audio waveform shows that results are
much better than exiting methods.
Compressed audio signal waveform

Filtered audio signal waveform

1

1
0.5

Amplitude

Amplitude

0.5

0

0
-0.5

-0.5
-1

-1

0.5

0.5

1

1.5

2

2.5
Time

3

3.5

4

4.5

5
5

x 10

Fig. 4. filtered audio signal waveform
The spectrogram generated for a wave input
is shown below in figure 5. The butterworth
stopband filter is used to remove the disturbance
from the speech signals and the accuracy is slightly
increased. The matching is done using inverse wave
transformations which reduce the time for
recognition of voices. The filtered audio spectrogram
shows that it has more accuracy than the other
techniques.

www.ijera.com

1

1.5

2

2.5
Time

3

3.5

4

4.5

5
5

x 10

Fig. 6. compressed audio signal waveform
The spectrogram generated for a wave input
is shown below in figure 7. DCT audio compression
is used with butterworth stopband filter and inverse
wave transformation to remove the disturbance from
the speech signals and the accuracy is slightly
increased. The compressed audio specgram shows
that it has better results than the previous one.

752 | P a g e
Sukhdeep Kaur et al Int. Journal of Engineering Research and Applications
ISSN : 2248-9622, Vol. 3, Issue 6, Nov-Dec 2013, pp.749-754
Compressed audio specgram

www.ijera.com

MISS ANALYSIS
100

8000

Sample size
Proposed
Old technique

90

7000

80

6000

70
Number of misses

Frequency

5000
4000
3000

60
50
40
30

2000

20

1000

10
0
5

10

15
Time

20

25

30

0

1

2

3

4

5
6
Number of tries

Fig. 7. compressed audio specgram

7

8

9

10

Fig. 9. miss analysis

6.1 Comparison
Then according to formulas accuracy and
error rates are calculated.
Figure 8 represents the hits analysis. It shows the
difference between the number of hits of existing
techniques and proposed technique. In order to
distinguish them different colours has been used. In
this fig X-axis represent number of tries and Y-axis
represent number of hits. It has been plotted with
different values of input samples and hits are
calculated from the recognition of speeches. It is
clearly seen that the hit ratio of old technique is very
low as compared to new technique. As a result, more
hits are obtained using DCT in speech recognition.

Figure 10 represents the accuracy analysis. It shows
the difference between the accuracy of existing
techniques and proposed technique. In this fig X-axis
represent number of tries and Y-axis represent
accuracy in percentage. The maximum value of
accuracy is 100 percent. It has been plotted with
different values of input samples and accuracies are
calculated from the recognition of speeches. It is
clearly seen that the accuracy of new technique is
very high as compared to old technique.
ACCURACY ANALYSIS
100
Maximum
Proposed
Old technique

98

HITS ANALYSIS
96

100
Sample size
Proposed
Old technique

80

94

Accuracy (%)

90

Number of hits

70
60

90
88
86

50

84

40

82

30

80

20
10
0

92

1

2

3

4

5
6
Number of tries

7

8

9

10

Fig. 10. accuracy analysis
1

2

3

4

5
6
Number of tries

7

8

9

10

Fig. 8. hit analysis
Figure 9 represents the miss analysis. It
shows the difference between the number of misses
of existing techniques and proposed technique. In
this fig X-axis represent number of tries and Y-axis
represent number of misses. It is clearly seen that the
miss ratio of new technique is very low as compared
to old technique. As a result, less misses are obtained
using DCT and filter in speech recognition. Miss
ratio should be minimum and hit ratio should be
maximum in order to achieve full accuracy.

www.ijera.com

Figure 11 represents the error rate analysis.
It shows the difference between the error rate of
existing techniques and proposed technique. In this
fig X-axis represent number of tries and Y-axis
represent error rate in percentage. The maximum
value of error rate is 100 percent. It is clearly seen
that the error rate of new technique is very low as
compared to old technique. As a result, less error rate
is obtained only when there are more hits and lesser
misses. It is clearly seen that error rate of the old
techniques is very high as close to the maximum
value of error rate and the error rate of the new
technique is very low as close to X-axis.

753 | P a g e
Sukhdeep Kaur et al Int. Journal of Engineering Research and Applications
ISSN : 2248-9622, Vol. 3, Issue 6, Nov-Dec 2013, pp.749-754
ERROR RATE ANALYSIS
100
Maximum
Proposed
Old technique

90
80

[2]

Error rate(%)

70
60
50

[3]

40
30
20
10
0

1

2

3

4

5
6
Number of tries

7

8

9

10

[4]

Fig. 11. error rate analysis

VII.

CONCLUSION AND FUTURE
WORK

7.1 Conclusion
The technique is able to authenticate the
particular speaker based on the individual
information that is included in the audio signal. The
performance of the new technique is better than the
existing techniques. Most of existing techniques does
recognition of speech through the text wrapping.
These techniques take too much time in recognizing
the speech and are not accurate in the noisy
environment. So the DCT and Butterworth stopband
filter is used with inverse wave transformation to
obtain the more accuracy and speed up the system.
The DCT packs energy in the low frequency regions.
The waveforms and specgrams showed that new
technique provides high accuracy rate and consumes
very less time in recognition. The error rate of the
proposed algorithm is very low as compared to old
algorithm.
7.2 Future Scope
In the future, the focus can be on reducing
the noise or background disturbance that is
introduced in the speech samples automatically
while recording. Modified discrete cosine transform
(MDCT) will be future compression algorithm,
whether standalone or combination of speech and
still or moving images. The various filtering
techniques can be applied in order to reduce
disturbance. By using these various filter techniques
speech recognition will be more accurate and fast.
The results can be further generalized if we are able
to unite voice activation detection with this
procedure we can perform speech recognition on live
voices and speech. More research has to be done on
this particular area to obtain more security.

References

[1]

Susanta Kumar Sarangi, and Goutam Saha,
“A Novel Approach in Feature Level for
Robust
Text-Independent
Speaker
Identification
systemhigher”,
IEEE
Proceedings
of
4th
International

www.ijera.com

[5]

[6]

[7]

[8]

[9]

[10]

[11]

www.ijera.com

Conference on Intelligent Human Computer
Interaction, December 27-29, 2012
Santosh K.Gaikwad, Bharti W. Gawali, and
Pravin Yannawar, “A Review on Speech
Recognition Technique”, International
Journal of Computer Applications (0975 –
8887), Vol.10– No.3, November 2010.
K. Ramamohan Rao and P. Yip, “Discrete
Cosine
Transform:
Algorithms,
Advantages, Applications”, Academic
Press Professional, Inc san Diego, CA,
USA, 1990.
Swapnil D. Daphal, and Sonal K. Jagtap,
“DSP Based Improved Speech Recognition
System”, International Conference on
Communication,
Information
and
Computing Technology, IEEE 2012.
C. Y. Fook, “Malay Speech Recognition
and Audio Visual Speech Recognition”,
International Conference on Biomedical
Engineering (ICoBE), pp. 479-484, Feb.
2012.
D. Addou, S.A. Selouani, M. Boudraa, and
B. Boudraa “Transform-based multi-feature
optimization for robust distributed speech
recognition”, IEEE GCC conference and
exhibition, februaury, 2011.
M.D. Pawar, and S.M. Badave,”Speaker
Identification System Using Wavelet
Transformation and Neural Network”,
International
Journal
of
Computer
Applications in Engineering Sciences, Vol.
1, special issue on CNS, July 2011.
Yu Shao, “Bayesian Separation with
Sparsity Promotion in Perceptual Wavelet
Domain for Speech Enhancement and
Hybrid Speech Recognition”, IEEE
Transactions on Systems, Vol. 41, pp. 284294, March 2011.
Ozlem Kalinli, Michael L. Seltzer, Jasha
Droppo, and Alex Acero,” Noise Adaptive
Training for Robust Automatic Speech
Recognition”, IEEE Transactions On
Audio, Speech and Language Processing,
Vol. 18, No. 8, November 2010.
Mohamad Adnan Al-Alaoui, Lina Al-Kanj,
Jimmy Azar1, and Elias Yaacoub, “Speech
Recognition using Artificial Neural
Networks and Hidden Markov Models”,
IMCL 2008 Conference, 16-18 April 2008.
Patricia Scanlon, Daniel P. W. Ellis, and
Richard B. Reilly, “Using Broad Phonetic
Group Experts for
Improved Speech
Recognition” IEEE Transactions on audio,
speech and language processing, Vol. 15,
pp. 803-812, March 2007.

754 | P a g e

More Related Content

What's hot

Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By MatlabAnkit Gujrati
 
Ai based character recognition and speech synthesis
Ai based character recognition and speech  synthesisAi based character recognition and speech  synthesis
Ai based character recognition and speech synthesisAnkita Jadhao
 
Esophageal Speech Recognition using Artificial Neural Network (ANN)
Esophageal Speech Recognition using Artificial Neural Network (ANN)Esophageal Speech Recognition using Artificial Neural Network (ANN)
Esophageal Speech Recognition using Artificial Neural Network (ANN)Saibur Rahman
 
Speaker recognition in android
Speaker recognition in androidSpeaker recognition in android
Speaker recognition in androidAnshuli Mittal
 
Speaker recognition on matlab
Speaker recognition on matlabSpeaker recognition on matlab
Speaker recognition on matlabArcanjo Salazaku
 
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...csandit
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech RecognitionAhmed Moawad
 
Review Paper on Noise Reduction Using Different Techniques
Review Paper on Noise Reduction Using Different TechniquesReview Paper on Noise Reduction Using Different Techniques
Review Paper on Noise Reduction Using Different TechniquesIRJET Journal
 
A Survey on Speaker Recognition System
A Survey on Speaker Recognition SystemA Survey on Speaker Recognition System
A Survey on Speaker Recognition SystemVani011
 
Automatic speech recognition system using deep learning
Automatic speech recognition system using deep learningAutomatic speech recognition system using deep learning
Automatic speech recognition system using deep learningAnkan Dutta
 
Wavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables SoundWavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables SoundTELKOMNIKA JOURNAL
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognitionCharu Joshi
 
Course report-islam-taharimul (1)
Course report-islam-taharimul (1)Course report-islam-taharimul (1)
Course report-islam-taharimul (1)TANVIRAHMED611926
 
Homomorphic speech processing
Homomorphic speech processingHomomorphic speech processing
Homomorphic speech processingsivakumar m
 

What's hot (19)

H42045359
H42045359H42045359
H42045359
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
Ai based character recognition and speech synthesis
Ai based character recognition and speech  synthesisAi based character recognition and speech  synthesis
Ai based character recognition and speech synthesis
 
Esophageal Speech Recognition using Artificial Neural Network (ANN)
Esophageal Speech Recognition using Artificial Neural Network (ANN)Esophageal Speech Recognition using Artificial Neural Network (ANN)
Esophageal Speech Recognition using Artificial Neural Network (ANN)
 
Speaker recognition in android
Speaker recognition in androidSpeaker recognition in android
Speaker recognition in android
 
Speaker recognition on matlab
Speaker recognition on matlabSpeaker recognition on matlab
Speaker recognition on matlab
 
Kc3517481754
Kc3517481754Kc3517481754
Kc3517481754
 
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
 
Ijetcas14 426
Ijetcas14 426Ijetcas14 426
Ijetcas14 426
 
speech enhancement
speech enhancementspeech enhancement
speech enhancement
 
Speech Signal Processing
Speech Signal ProcessingSpeech Signal Processing
Speech Signal Processing
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Review Paper on Noise Reduction Using Different Techniques
Review Paper on Noise Reduction Using Different TechniquesReview Paper on Noise Reduction Using Different Techniques
Review Paper on Noise Reduction Using Different Techniques
 
A Survey on Speaker Recognition System
A Survey on Speaker Recognition SystemA Survey on Speaker Recognition System
A Survey on Speaker Recognition System
 
Automatic speech recognition system using deep learning
Automatic speech recognition system using deep learningAutomatic speech recognition system using deep learning
Automatic speech recognition system using deep learning
 
Wavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables SoundWavelet Based Feature Extraction for the Indonesian CV Syllables Sound
Wavelet Based Feature Extraction for the Indonesian CV Syllables Sound
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
Course report-islam-taharimul (1)
Course report-islam-taharimul (1)Course report-islam-taharimul (1)
Course report-islam-taharimul (1)
 
Homomorphic speech processing
Homomorphic speech processingHomomorphic speech processing
Homomorphic speech processing
 

Viewers also liked

Viewers also liked (20)

El deporte
El deporteEl deporte
El deporte
 
Ejer.elementos narrac
Ejer.elementos narracEjer.elementos narrac
Ejer.elementos narrac
 
Trust & Ethics in PR — PRSA Training Series
Trust & Ethics in PR — PRSA Training SeriesTrust & Ethics in PR — PRSA Training Series
Trust & Ethics in PR — PRSA Training Series
 
Breathing for brass players nick article v3
Breathing for brass players   nick article v3Breathing for brass players   nick article v3
Breathing for brass players nick article v3
 
04 isc 151 capitulo vi
04 isc 151 capitulo vi04 isc 151 capitulo vi
04 isc 151 capitulo vi
 
Internet como herramienta para Marketing Personal
Internet como herramienta para Marketing PersonalInternet como herramienta para Marketing Personal
Internet como herramienta para Marketing Personal
 
Best Tech Practices for Professional Learning
Best Tech Practices for Professional LearningBest Tech Practices for Professional Learning
Best Tech Practices for Professional Learning
 
It Presentation
It PresentationIt Presentation
It Presentation
 
Producto 14-17- tics
Producto  14-17- ticsProducto  14-17- tics
Producto 14-17- tics
 
Deber de estructura de datos de lorena cabrera
Deber de estructura de datos de lorena cabreraDeber de estructura de datos de lorena cabrera
Deber de estructura de datos de lorena cabrera
 
La globalización
La globalizaciónLa globalización
La globalización
 
Trabajo completo total
Trabajo completo totalTrabajo completo total
Trabajo completo total
 
Deporte
DeporteDeporte
Deporte
 
Deporte
DeporteDeporte
Deporte
 
Alex vasquez
Alex vasquezAlex vasquez
Alex vasquez
 
Producto 14--17- TICS
Producto  14--17- TICSProducto  14--17- TICS
Producto 14--17- TICS
 
Direitos Humanos
Direitos HumanosDireitos Humanos
Direitos Humanos
 
Vasquez alex
Vasquez alexVasquez alex
Vasquez alex
 
A HistóRia De Israel
A HistóRia De IsraelA HistóRia De Israel
A HistóRia De Israel
 
Artigo 1 Praticas De Mmarkting
Artigo 1 Praticas De MmarktingArtigo 1 Praticas De Mmarkting
Artigo 1 Praticas De Mmarkting
 

Similar to Enhanced Speech Recognition Using DCT and IWT

Combined feature extraction techniques and naive bayes classifier for speech ...
Combined feature extraction techniques and naive bayes classifier for speech ...Combined feature extraction techniques and naive bayes classifier for speech ...
Combined feature extraction techniques and naive bayes classifier for speech ...csandit
 
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...cscpconf
 
Speech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceSpeech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceIlhaan Marwat
 
Development of Algorithm for Voice Operated Switch for Digital Audio Control ...
Development of Algorithm for Voice Operated Switch for Digital Audio Control ...Development of Algorithm for Voice Operated Switch for Digital Audio Control ...
Development of Algorithm for Voice Operated Switch for Digital Audio Control ...IJMER
 
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEMULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEIRJET Journal
 
Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Editor IJARCET
 
Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Editor IJARCET
 
SPEECH ENHANCEMENT USING KERNEL AND NORMALIZED KERNEL AFFINE PROJECTION ALGOR...
SPEECH ENHANCEMENT USING KERNEL AND NORMALIZED KERNEL AFFINE PROJECTION ALGOR...SPEECH ENHANCEMENT USING KERNEL AND NORMALIZED KERNEL AFFINE PROJECTION ALGOR...
SPEECH ENHANCEMENT USING KERNEL AND NORMALIZED KERNEL AFFINE PROJECTION ALGOR...sipij
 
Bachelors project summary
Bachelors project summaryBachelors project summary
Bachelors project summaryAditya Deshmukh
 
IRJET- Voice Command Execution with Speech Recognition and Synthesizer
IRJET- Voice Command Execution with Speech Recognition and SynthesizerIRJET- Voice Command Execution with Speech Recognition and Synthesizer
IRJET- Voice Command Execution with Speech Recognition and SynthesizerIRJET Journal
 
Isolated Word Recognition System For Tamil Spoken Language Using Back Propaga...
Isolated Word Recognition System For Tamil Spoken Language Using Back Propaga...Isolated Word Recognition System For Tamil Spoken Language Using Back Propaga...
Isolated Word Recognition System For Tamil Spoken Language Using Back Propaga...CSEIJJournal
 
Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...karthik annam
 
A survey on Enhancements in Speech Recognition
A survey on Enhancements in Speech RecognitionA survey on Enhancements in Speech Recognition
A survey on Enhancements in Speech RecognitionIRJET Journal
 
A Review On Speech Feature Techniques And Classification Techniques
A Review On Speech Feature Techniques And Classification TechniquesA Review On Speech Feature Techniques And Classification Techniques
A Review On Speech Feature Techniques And Classification TechniquesNicole Heredia
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approachijsrd.com
 
On the use of voice activity detection in speech emotion recognition
On the use of voice activity detection in speech emotion recognitionOn the use of voice activity detection in speech emotion recognition
On the use of voice activity detection in speech emotion recognitionjournalBEEI
 
Snorm–A Prototype for Increasing Audio File Stepwise Normalization
Snorm–A Prototype for Increasing Audio File Stepwise NormalizationSnorm–A Prototype for Increasing Audio File Stepwise Normalization
Snorm–A Prototype for Increasing Audio File Stepwise NormalizationIJERA Editor
 
Financial Transactions in ATM Machines using Speech Signals
Financial Transactions in ATM Machines using Speech SignalsFinancial Transactions in ATM Machines using Speech Signals
Financial Transactions in ATM Machines using Speech SignalsIJERA Editor
 

Similar to Enhanced Speech Recognition Using DCT and IWT (20)

Combined feature extraction techniques and naive bayes classifier for speech ...
Combined feature extraction techniques and naive bayes classifier for speech ...Combined feature extraction techniques and naive bayes classifier for speech ...
Combined feature extraction techniques and naive bayes classifier for speech ...
 
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
COMBINED FEATURE EXTRACTION TECHNIQUES AND NAIVE BAYES CLASSIFIER FOR SPEECH ...
 
Speech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceSpeech Recognition in Artificail Inteligence
Speech Recognition in Artificail Inteligence
 
Ijetcas14 390
Ijetcas14 390Ijetcas14 390
Ijetcas14 390
 
Development of Algorithm for Voice Operated Switch for Digital Audio Control ...
Development of Algorithm for Voice Operated Switch for Digital Audio Control ...Development of Algorithm for Voice Operated Switch for Digital Audio Control ...
Development of Algorithm for Voice Operated Switch for Digital Audio Control ...
 
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEMULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
 
Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189
 
Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189Volume 2-issue-6-2186-2189
Volume 2-issue-6-2186-2189
 
SPEECH ENHANCEMENT USING KERNEL AND NORMALIZED KERNEL AFFINE PROJECTION ALGOR...
SPEECH ENHANCEMENT USING KERNEL AND NORMALIZED KERNEL AFFINE PROJECTION ALGOR...SPEECH ENHANCEMENT USING KERNEL AND NORMALIZED KERNEL AFFINE PROJECTION ALGOR...
SPEECH ENHANCEMENT USING KERNEL AND NORMALIZED KERNEL AFFINE PROJECTION ALGOR...
 
Bachelors project summary
Bachelors project summaryBachelors project summary
Bachelors project summary
 
IRJET- Voice Command Execution with Speech Recognition and Synthesizer
IRJET- Voice Command Execution with Speech Recognition and SynthesizerIRJET- Voice Command Execution with Speech Recognition and Synthesizer
IRJET- Voice Command Execution with Speech Recognition and Synthesizer
 
K010416167
K010416167K010416167
K010416167
 
Isolated Word Recognition System For Tamil Spoken Language Using Back Propaga...
Isolated Word Recognition System For Tamil Spoken Language Using Back Propaga...Isolated Word Recognition System For Tamil Spoken Language Using Back Propaga...
Isolated Word Recognition System For Tamil Spoken Language Using Back Propaga...
 
Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...Performance estimation based recurrent-convolutional encoder decoder for spee...
Performance estimation based recurrent-convolutional encoder decoder for spee...
 
A survey on Enhancements in Speech Recognition
A survey on Enhancements in Speech RecognitionA survey on Enhancements in Speech Recognition
A survey on Enhancements in Speech Recognition
 
A Review On Speech Feature Techniques And Classification Techniques
A Review On Speech Feature Techniques And Classification TechniquesA Review On Speech Feature Techniques And Classification Techniques
A Review On Speech Feature Techniques And Classification Techniques
 
Speaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization ApproachSpeaker Recognition System using MFCC and Vector Quantization Approach
Speaker Recognition System using MFCC and Vector Quantization Approach
 
On the use of voice activity detection in speech emotion recognition
On the use of voice activity detection in speech emotion recognitionOn the use of voice activity detection in speech emotion recognition
On the use of voice activity detection in speech emotion recognition
 
Snorm–A Prototype for Increasing Audio File Stepwise Normalization
Snorm–A Prototype for Increasing Audio File Stepwise NormalizationSnorm–A Prototype for Increasing Audio File Stepwise Normalization
Snorm–A Prototype for Increasing Audio File Stepwise Normalization
 
Financial Transactions in ATM Machines using Speech Signals
Financial Transactions in ATM Machines using Speech SignalsFinancial Transactions in ATM Machines using Speech Signals
Financial Transactions in ATM Machines using Speech Signals
 

Recently uploaded

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 

Recently uploaded (20)

Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 

Enhanced Speech Recognition Using DCT and IWT

  • 1. Sukhdeep Kaur et al Int. Journal of Engineering Research and Applications ISSN : 2248-9622, Vol. 3, Issue 6, Nov-Dec 2013, pp.749-754 RESEARCH ARTICLE www.ijera.com OPEN ACCESS Enhancement of Speech Recognition Algorithm Using DCT and Inverse Wave Transformation Sukhdeep Kaur, Er. Gurwinder Kaur Research Student, Assistant Professor (Electronics and Communication Engineering Department Yadavindra College of Engineering (YCoE) Talwandi Sabo, District Bathinda, Punjab, India.) Electronics and Communication Engineering Department Yadavindra College of Engineering (YCoE), Talwandi Sabo, District Bathinda, Punjab, India.) ABSTRACT This research work focus on providing better performance in speech recognition algorithm by integrating digital signal transposition with speech recognition techniques. This is an approach for improving the performance of speech recognition algorithm using butterworth stopband filter and discrete cosine transformation based speech compression with inverse wave transformation. The main objective is to integrate filtering with speech recognition algorithm to improve the results when noise is present in the signal. In this work, the matching is done using inverse wave transformations which reduce the time for recognition of voices. Proposed algorithm is designed and implemented in MATLAB. The proposed algorithm has been tested on the given samples and evaluated using different recognizable and unrecognizable samples obtaining a recognition ratio about 98%. It is shown that the proposed algorithm provides better results than existing techniques. Proposed algorithm increase the accuracy of the speech recognition system. Keywords: filter, inverse wave transformation, speech recognition, wave format. I. Introduction The speech recognition is defined as the process of considering the spoken word as an input speech and matches it with the previously recorded speeches on basis of various parameters. This can be done by various methods. It is a process of automatically recognizing who is speaking on the basis of features of speaker of the speech signal. Speech recognition features has some of the advantages like speech input is easy to do because it does not demand a specialized skill as does typing or push button procedures. Information can be input even when the user is not constant or doing other activities including the hands, eyes, legs or ears. Since a telephone or microphone can be used as an input terminal [1]. Basically, speaker recognition is classified in to speaker identification and speaker verification. Wide application of speech recognition system includes control access to services such as database access services banking by telephone, voice dialing telephone shopping. Now speech recognition technology is the most desirable technology to create new services. II. Classification of Speech Recognition Systems Most speech recognition systems can be classified according to the following categories [2]: www.ijera.com  Speaker Dependent versus Speaker Independent A speaker‐dependent speech recognition system is one that is trained to recognize the speech of only one speaker. A speaker‐independent system is one that is independence is difficult to attain, as speech recognition organizations tend to become adjusted to the speakers they are trained on, resulting in error rates are higher than speaker dependent system.  Isolated versus Continuous In isolated speech the speaker pauses shortly between every word, while in continuous speech the speaker speaks in a continuous and possibly long stream with little or no breaks in between. Isolated speech recognition systems are easy to build. Words spoken in continuous speech on the other hand are subjected to the co-articulation effect, in which the pronunciation of a word is modified by the words surrounding it.  Keyword based versus Subword unit based A speech recognition system can be trained to recognize whole words, like dog or cat. This is useful in applications like voice‐command‐systems, in which the system need only recognize a small set of words. This approach is simple but not scalable. As the dictionary of recognized words grow, so too the complexity and execution time of the recognizer. 749 | P a g e
  • 2. Sukhdeep Kaur et al Int. Journal of Engineering Research and Applications ISSN : 2248-9622, Vol. 3, Issue 6, Nov-Dec 2013, pp.749-754 III. www.ijera.com Approaches to Speech Recognition Basically there exist three approaches to speech recognition. They are [11]:  Acoustic Phonetic Approach Acoustic-phonetic approach assumes that the phonetic units are broadly characterized by a set of features such as format frequency, voiced/unvoiced and pitch. These features are extracted from the speech signal and are used to segment and level the speech.  Pattern Recognition Approach Pattern recognition approach requires no explicit knowledge of speech. This approach has two steps – namely, training of speech patterns based on some generic spectral parameter set and recognition of patterns via pattern comparison. The popular pattern recognition techniques include template matching, Hidden Markov Model  Artificial Intelligence Approach Knowledge based approach attempts to mechanize the recognition procedure according to the way a person applies its intelligence in visualizing, analyzing and finally making a decision on the measured acoustic features. Expert system is used widely in this approach. IV. Problem Definition In problem definition define different problems in existing approaches and how these problems will be eliminated or reduced using improved audio recognition algorithm.  The methods developed so far fail or not give efficient results when there exist noise in the audio signal.  Noises add too much disturbance on the given signal and which decreases the audio recognition algorithm accuracy rate.  Noise leaves bad effects on the bandwidth of a given network so it becomes major issue in audio signals.  Exiting networks such as neural networks take much time in training period V. Proposed Work In the proposed method, the goal is to detect the speaker from the previously recorded wave samples. The main concentration is on accuracy and speed. The proposed method is implemented using MatLab. www.ijera.com Fig.1. flow-chart for speech recognition algorithm In the proposed algorithm samples of dummy speeches are taken for experiments and a database is prepared in which these speeches are saved. The speech samples are taken in the wave format. The input samples are taken from various persons by recording their voices. These are the tested voices TVi for the given system. In the testing phase test voice is match with the stored input voice and recognition decisions are made. Butterworth filter is used to remove the noise from the system. Speech signals degrade due to the presence of background noise and noise reduction is an important field of speech processing. Butterworth stopband filter is used to minimize the disturbance from the speech signals. Discrete cosine transforms (DCT) based speech compression is used to reduce the size of the speech information. It is used to speed up the system by remove the redundancy from audio information. Compression is the process of elimination of redundancy and duplicity. The DCT is very common when encoding video and speech tracks on computers. The DCT is very similar to the DFT but the output values of DCT are real numbers and the output vector is approximately twice as long as the 750 | P a g e
  • 3. Sukhdeep Kaur et al Int. Journal of Engineering Research and Applications ISSN : 2248-9622, Vol. 3, Issue 6, Nov-Dec 2013, pp.749-754 DFT output. It shows a sequence of finite data points in terms of sum of cosine functions. The 1-d discrete cosine transforms k=1, 2,...N (1) Where www.ijera.com persons is taken and then tested with the previously stored 10 certified recognized voices. Table 1 shows the number of certified audio samples that are saved according to which the inputs can be matched. For the experimental setup, 10 samples of certified audio clips are taken. 100 samples of unrecognized inputs are taken which have to be matched with certified audio samples. Again 100 samples of unknown person audio recordings are taken. Table No. 1. Set of Samples Type Sample Size Certified 10 (2) Input Recognized 100 N is the length of x, and x and y are the same size. If x is a matrix, DCT transforms its columns. The series is indexed from n = 1 and k = 1 instead of the usual n = 0 and k = 0 because MATLAB vectors run from 1 to N instead of from 0 to N- 1. The transpose of each of the tested voices TV is taken. The transpose is taken so that the speech frames that are generally shown horizontally could be converted in vertical form so that it can be easy to understand and recognize the speech frames easily. Take the input sample of voice that has to be tested. Read the wave input and take its transpose. The input sample taken is in wave format and it is tested among the database that contains previously recorded voice sample. Match the input voice with tested database one by one and find correlation. Correlation computes a measure of similarity of two input signals as they are shifted by one another. The correlation result reaches a maximum at the time when the two signals match best. The correlation value is denoted by M. The highest the value of correlation, more the two voices are similar. Find the maximum value of correlation and let Correlation be M. If the value of M is greater than the threshold then recognized else not found. If the correlation is more than the threshold value then it must be the same audio else the input audio is not detected in the database. The threshold value is set first. If the correlation value is greater than the threshold value then voice input sample will be called as recognized otherwise the tested voices will not contain the input sample of voice. Unknown Person Audio 100 The 200 samples in total are taken for testing. For the samples are taken then hits are calculated which gives the number of times the correct recognition is done when the input is present in the previously saved set of certified voices. Recognition is done by matching the input samples voices having wave format with the previously certified set of samples. Similarly, number of miss are evaluated accordingly voices when sample is not found in the record of certified voices. VI. Results The input audio signal waveform is shown in figure 2. It represents an audio signal or recording. The amplitude of the signal is measured on the yaxis and time is measured on the x-axis. This figure shows that background disturbance is present in the speech signals. The disturbance in signals degrades the performance. The accuracy rate slightly decreases as errors are introduced. Input audio signal waveform 0.8 0.6 0.4 Amplitude 0.2 0 -0.2 -0.4 -0.6 -0.8 5.1 Database The database consists of recorded voices which are used for matching with the input samples. This database consists of 100 different voices. These are different images with different poses from different type of sources. Various samples are taken in which 100 set of recognized persons and 100 set of unrecognized www.ijera.com -1 0.5 1 1.5 2 2.5 Time 3 3.5 4 4.5 5 5 x 10 Fig. 2. input audio signal waveform The spectrogram generated for a wave input is shown below in figure3. Spectrogram is Timedependent frequency analysis. In the spectrogram the time axis is the horizontal axis and frequency is the 751 | P a g e
  • 4. Sukhdeep Kaur et al Int. Journal of Engineering Research and Applications ISSN : 2248-9622, Vol. 3, Issue 6, Nov-Dec 2013, pp.749-754 vertical axis. The input audio specgram shows that background disturbance is present in the speech signals and the accuracy rate slightly decreases. www.ijera.com Filtered audio specgram 8000 7000 Input audio specgram 6000 7000 5000 Frequency 8000 6000 3000 5000 Frequency 4000 2000 4000 1000 3000 0 2000 5 1000 10 15 Time 20 25 30 Fig. 5. filtered audio specgram 0 5 10 15 Time 20 25 30 Fig. 3. input audio specgram The filtered audio signal waveform is shown in figure 4. The background disturbance in the signals degrades the performance and noise reduction is an important field of speech processing. The filtered audio signal waveform shows that background noise was successfully removed from a signal by using butterworth stopband filter. In this matching is done using inverse wave transformations which reduce the time for recognition of voices. The filtered audio waveform shows that it has more accuracy. The compressed audio signal waveform is shown in figure 6. DCT based audio compression is used with butterworth stopband filter and inverse wave transformation to remove the disturbance from the speech signals and to speed up the process. DCT based data compression is the process of encoding information utilizing fewer bits than an un-encoded representation would use through use of particular encoding schemes. Background noise and redundancy are successfully removed from a signal by using Butterworth stopband filter and DCT. The compressed audio waveform shows that results are much better than exiting methods. Compressed audio signal waveform Filtered audio signal waveform 1 1 0.5 Amplitude Amplitude 0.5 0 0 -0.5 -0.5 -1 -1 0.5 0.5 1 1.5 2 2.5 Time 3 3.5 4 4.5 5 5 x 10 Fig. 4. filtered audio signal waveform The spectrogram generated for a wave input is shown below in figure 5. The butterworth stopband filter is used to remove the disturbance from the speech signals and the accuracy is slightly increased. The matching is done using inverse wave transformations which reduce the time for recognition of voices. The filtered audio spectrogram shows that it has more accuracy than the other techniques. www.ijera.com 1 1.5 2 2.5 Time 3 3.5 4 4.5 5 5 x 10 Fig. 6. compressed audio signal waveform The spectrogram generated for a wave input is shown below in figure 7. DCT audio compression is used with butterworth stopband filter and inverse wave transformation to remove the disturbance from the speech signals and the accuracy is slightly increased. The compressed audio specgram shows that it has better results than the previous one. 752 | P a g e
  • 5. Sukhdeep Kaur et al Int. Journal of Engineering Research and Applications ISSN : 2248-9622, Vol. 3, Issue 6, Nov-Dec 2013, pp.749-754 Compressed audio specgram www.ijera.com MISS ANALYSIS 100 8000 Sample size Proposed Old technique 90 7000 80 6000 70 Number of misses Frequency 5000 4000 3000 60 50 40 30 2000 20 1000 10 0 5 10 15 Time 20 25 30 0 1 2 3 4 5 6 Number of tries Fig. 7. compressed audio specgram 7 8 9 10 Fig. 9. miss analysis 6.1 Comparison Then according to formulas accuracy and error rates are calculated. Figure 8 represents the hits analysis. It shows the difference between the number of hits of existing techniques and proposed technique. In order to distinguish them different colours has been used. In this fig X-axis represent number of tries and Y-axis represent number of hits. It has been plotted with different values of input samples and hits are calculated from the recognition of speeches. It is clearly seen that the hit ratio of old technique is very low as compared to new technique. As a result, more hits are obtained using DCT in speech recognition. Figure 10 represents the accuracy analysis. It shows the difference between the accuracy of existing techniques and proposed technique. In this fig X-axis represent number of tries and Y-axis represent accuracy in percentage. The maximum value of accuracy is 100 percent. It has been plotted with different values of input samples and accuracies are calculated from the recognition of speeches. It is clearly seen that the accuracy of new technique is very high as compared to old technique. ACCURACY ANALYSIS 100 Maximum Proposed Old technique 98 HITS ANALYSIS 96 100 Sample size Proposed Old technique 80 94 Accuracy (%) 90 Number of hits 70 60 90 88 86 50 84 40 82 30 80 20 10 0 92 1 2 3 4 5 6 Number of tries 7 8 9 10 Fig. 10. accuracy analysis 1 2 3 4 5 6 Number of tries 7 8 9 10 Fig. 8. hit analysis Figure 9 represents the miss analysis. It shows the difference between the number of misses of existing techniques and proposed technique. In this fig X-axis represent number of tries and Y-axis represent number of misses. It is clearly seen that the miss ratio of new technique is very low as compared to old technique. As a result, less misses are obtained using DCT and filter in speech recognition. Miss ratio should be minimum and hit ratio should be maximum in order to achieve full accuracy. www.ijera.com Figure 11 represents the error rate analysis. It shows the difference between the error rate of existing techniques and proposed technique. In this fig X-axis represent number of tries and Y-axis represent error rate in percentage. The maximum value of error rate is 100 percent. It is clearly seen that the error rate of new technique is very low as compared to old technique. As a result, less error rate is obtained only when there are more hits and lesser misses. It is clearly seen that error rate of the old techniques is very high as close to the maximum value of error rate and the error rate of the new technique is very low as close to X-axis. 753 | P a g e
  • 6. Sukhdeep Kaur et al Int. Journal of Engineering Research and Applications ISSN : 2248-9622, Vol. 3, Issue 6, Nov-Dec 2013, pp.749-754 ERROR RATE ANALYSIS 100 Maximum Proposed Old technique 90 80 [2] Error rate(%) 70 60 50 [3] 40 30 20 10 0 1 2 3 4 5 6 Number of tries 7 8 9 10 [4] Fig. 11. error rate analysis VII. CONCLUSION AND FUTURE WORK 7.1 Conclusion The technique is able to authenticate the particular speaker based on the individual information that is included in the audio signal. The performance of the new technique is better than the existing techniques. Most of existing techniques does recognition of speech through the text wrapping. These techniques take too much time in recognizing the speech and are not accurate in the noisy environment. So the DCT and Butterworth stopband filter is used with inverse wave transformation to obtain the more accuracy and speed up the system. The DCT packs energy in the low frequency regions. The waveforms and specgrams showed that new technique provides high accuracy rate and consumes very less time in recognition. The error rate of the proposed algorithm is very low as compared to old algorithm. 7.2 Future Scope In the future, the focus can be on reducing the noise or background disturbance that is introduced in the speech samples automatically while recording. Modified discrete cosine transform (MDCT) will be future compression algorithm, whether standalone or combination of speech and still or moving images. The various filtering techniques can be applied in order to reduce disturbance. By using these various filter techniques speech recognition will be more accurate and fast. The results can be further generalized if we are able to unite voice activation detection with this procedure we can perform speech recognition on live voices and speech. More research has to be done on this particular area to obtain more security. References [1] Susanta Kumar Sarangi, and Goutam Saha, “A Novel Approach in Feature Level for Robust Text-Independent Speaker Identification systemhigher”, IEEE Proceedings of 4th International www.ijera.com [5] [6] [7] [8] [9] [10] [11] www.ijera.com Conference on Intelligent Human Computer Interaction, December 27-29, 2012 Santosh K.Gaikwad, Bharti W. Gawali, and Pravin Yannawar, “A Review on Speech Recognition Technique”, International Journal of Computer Applications (0975 – 8887), Vol.10– No.3, November 2010. K. Ramamohan Rao and P. Yip, “Discrete Cosine Transform: Algorithms, Advantages, Applications”, Academic Press Professional, Inc san Diego, CA, USA, 1990. Swapnil D. Daphal, and Sonal K. Jagtap, “DSP Based Improved Speech Recognition System”, International Conference on Communication, Information and Computing Technology, IEEE 2012. C. Y. Fook, “Malay Speech Recognition and Audio Visual Speech Recognition”, International Conference on Biomedical Engineering (ICoBE), pp. 479-484, Feb. 2012. D. Addou, S.A. Selouani, M. Boudraa, and B. Boudraa “Transform-based multi-feature optimization for robust distributed speech recognition”, IEEE GCC conference and exhibition, februaury, 2011. M.D. Pawar, and S.M. Badave,”Speaker Identification System Using Wavelet Transformation and Neural Network”, International Journal of Computer Applications in Engineering Sciences, Vol. 1, special issue on CNS, July 2011. Yu Shao, “Bayesian Separation with Sparsity Promotion in Perceptual Wavelet Domain for Speech Enhancement and Hybrid Speech Recognition”, IEEE Transactions on Systems, Vol. 41, pp. 284294, March 2011. Ozlem Kalinli, Michael L. Seltzer, Jasha Droppo, and Alex Acero,” Noise Adaptive Training for Robust Automatic Speech Recognition”, IEEE Transactions On Audio, Speech and Language Processing, Vol. 18, No. 8, November 2010. Mohamad Adnan Al-Alaoui, Lina Al-Kanj, Jimmy Azar1, and Elias Yaacoub, “Speech Recognition using Artificial Neural Networks and Hidden Markov Models”, IMCL 2008 Conference, 16-18 April 2008. Patricia Scanlon, Daniel P. W. Ellis, and Richard B. Reilly, “Using Broad Phonetic Group Experts for Improved Speech Recognition” IEEE Transactions on audio, speech and language processing, Vol. 15, pp. 803-812, March 2007. 754 | P a g e