1. Gerome Jan Llames
MEECE - CCO
Project in MEE 1231: Digital Signal Processing
Yes or No Speech Recognition
Objectives:
1. To build a program, involving Digital Signal Processing, which would detect the speech
signal if it is a Yes or a No.
2. To know the difference between a yes and a no signal.
INTRODUCTION
According to Wikipedia, in Computer Science and Electrical Engineering speech recognition (SR)
is the translation of spoken words into text. It is also known as "automatic speech recognition" (ASR),
"computer speech recognition", or just "speech to text" (STT). Some SR systems use "speaker
independent speech recognition" while others use "training" where an individual speaker reads sections
of text into the SR system. These systems analyze the person's specific voice and use it to fine tune the
recognition of that person's speech, resulting in more accurate transcription. Systems that do not use
training are called "speaker independent" systems. Systems that use training are called "speaker
dependent" systems.
Speech recognition applications include voice user interfaces such as voice dialing (e.g. "Call
home"), call routing (e.g. "I would like to make a collect call"), domotic appliance control, search (e.g.
find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card
number), preparation of structured documents (e.g. a radiology report), speech-to-text processing
(e.g., word processors or emails), and aircraft (usually termed Direct Voice Input).
Although speech recognition in general is a very complex problem, even a simple program can
distinguish between the two words yes and no. Although any two words could be used for a project like
this, yes and no were chosen because there are real systems that do exactly this task. For example, calls
to a company and telephone surveys are often handled by automated systems that ask the person
questions and attempt to determine the response using speech recognition. In a situation where the
question is answered by yes or no, a yes/no speech recognition system is very useful.
2. DIGITAL SIGNAL PROCESSING SYSTEM
System Workflow
This project runs in real time. It all starts from the speech signal, yes or no, which makes use the
built-in microphone of any computer. Then the signal will be processed in MATLAB where fft takes
place. Finally, if the value is below the threshold it would display Yes and no if it’s the other way around.
3. DATA ANALYSIS
The two table's shows data collected after 8 real time testings. It achieves an impressive 100%
accuracy. It shows the voices, frequencies, amplitudes and f values. With these data we can say that
there are more signals in the higher frequency in the word yes compared to no while higher amplitude
occurs when the word no is spoken. With threshold value at 12, I can say that indeed it has a 100%
accuracy.
Voice Frequency Amplitude f value
YesMine ~ 1400 Hz ~ 0.3 5.9218
YesBaruc ~ 1800 Hz ~ 0.25 1.8685
YesAyn ~ 1600 Hz ~ 0.6 4.2221
YesEra ~ 1400 Hz ~ 1.0 7.2813
YesGeorge ~ 1400 Hz ~ 0.45 5.5708
YesSoheib ~ 600 Hz ~ 0.35 9.6850
YesJaybee ~ 1800 Hz ~ 0.7 3.7257
YesJudilyn ~ 1400 Hz ~ 0.3 5.3822
Table 1: Data analysis of YES
Voice Frequency Amplitude f value
NoMine ~ 1000 Hz ~ 0.4 21.3383
NoBaruc ~ 1000 Hz ~ 0.4 18.6965
NoAyn ~ 900 Hz ~ 0.7 18.0442
NoEra ~ 1200 Hz ~ 0.4 20.2817
NoGeorge ~ 1000 Hz ~ 0.9 25.5547
NoSoheib ~ 450 Hz ~ 0.28 18.1870
NoJaybee ~ 1000 Hz ~ 0.8 30.7095
NoJudilyn ~ 1600 Hz ~ 0.8 15.3298
Table 2: Data analysis of No
9. HOW TO RUN THE PROGRAM?
Step 1: Run the YesOrNoRecorder m-file. "YesOrNoRecorder"
Step 2: Run the Yes_Or_No_Project function file with a sampling frequency of 44100 Hz which is twice
the frequency of the speech signal. "Yes_Or_No_Project(x,44100)"
Watch the video here: http://www.youtube.com/watch?v=EOcp7pxQOBA&feature=youtu.be
LIMITATIONS
There was no external microphone that was used. I used the built in microphone in my laptop.
Before I started the recording, it was make sure that there were no unnecessary loud noises. Using a
filtered microphone would be a good choice.
CONCLUSION
Yes or No Speech recognition is a good example of a system which involves Digital Signal
Processing. It make use of the well-known Fast Fourier Transform (fft). Before I started making the
project, I already found some interesting facts about the two words yes and no. The two words have an
unvoiced consonant and voiced consonant sound respectively. Voiced consonant means that when we
say a certain word there is that vibration in our vocal cords while unvoiced consonant doesn't. Unvoiced
consonants has that larger energy compared to that of Voiced as well. That was proven when the testing
was conducted.
The data that was gathered after the testing shows the difference between the word yes and
no. In the waveform x that was plotted, there were more values in the high frequency in the yes signals
than that of no. It was because of the sound 's'. However the amplitude of the no signals were much
higher compared to that of yes.
Finally, based on the data, the objective was successfully achieved with an excellent 100%
accuracy.