SlideShare a Scribd company logo
1 of 9
Download to read offline
Gerome Jan Llames
MEECE - CCO
Project in MEE 1231: Digital Signal Processing
Yes or No Speech Recognition
Objectives:
1. To build a program, involving Digital Signal Processing, which would detect the speech
signal if it is a Yes or a No.
2. To know the difference between a yes and a no signal.
INTRODUCTION
According to Wikipedia, in Computer Science and Electrical Engineering speech recognition (SR)
is the translation of spoken words into text. It is also known as "automatic speech recognition" (ASR),
"computer speech recognition", or just "speech to text" (STT). Some SR systems use "speaker
independent speech recognition" while others use "training" where an individual speaker reads sections
of text into the SR system. These systems analyze the person's specific voice and use it to fine tune the
recognition of that person's speech, resulting in more accurate transcription. Systems that do not use
training are called "speaker independent" systems. Systems that use training are called "speaker
dependent" systems.
Speech recognition applications include voice user interfaces such as voice dialing (e.g. "Call
home"), call routing (e.g. "I would like to make a collect call"), domotic appliance control, search (e.g.
find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card
number), preparation of structured documents (e.g. a radiology report), speech-to-text processing
(e.g., word processors or emails), and aircraft (usually termed Direct Voice Input).
Although speech recognition in general is a very complex problem, even a simple program can
distinguish between the two words yes and no. Although any two words could be used for a project like
this, yes and no were chosen because there are real systems that do exactly this task. For example, calls
to a company and telephone surveys are often handled by automated systems that ask the person
questions and attempt to determine the response using speech recognition. In a situation where the
question is answered by yes or no, a yes/no speech recognition system is very useful.
DIGITAL SIGNAL PROCESSING SYSTEM
System Workflow
This project runs in real time. It all starts from the speech signal, yes or no, which makes use the
built-in microphone of any computer. Then the signal will be processed in MATLAB where fft takes
place. Finally, if the value is below the threshold it would display Yes and no if it’s the other way around.
DATA ANALYSIS
The two table's shows data collected after 8 real time testings. It achieves an impressive 100%
accuracy. It shows the voices, frequencies, amplitudes and f values. With these data we can say that
there are more signals in the higher frequency in the word yes compared to no while higher amplitude
occurs when the word no is spoken. With threshold value at 12, I can say that indeed it has a 100%
accuracy.
Voice Frequency Amplitude f value
YesMine ~ 1400 Hz ~ 0.3 5.9218
YesBaruc ~ 1800 Hz ~ 0.25 1.8685
YesAyn ~ 1600 Hz ~ 0.6 4.2221
YesEra ~ 1400 Hz ~ 1.0 7.2813
YesGeorge ~ 1400 Hz ~ 0.45 5.5708
YesSoheib ~ 600 Hz ~ 0.35 9.6850
YesJaybee ~ 1800 Hz ~ 0.7 3.7257
YesJudilyn ~ 1400 Hz ~ 0.3 5.3822
Table 1: Data analysis of YES
Voice Frequency Amplitude f value
NoMine ~ 1000 Hz ~ 0.4 21.3383
NoBaruc ~ 1000 Hz ~ 0.4 18.6965
NoAyn ~ 900 Hz ~ 0.7 18.0442
NoEra ~ 1200 Hz ~ 0.4 20.2817
NoGeorge ~ 1000 Hz ~ 0.9 25.5547
NoSoheib ~ 450 Hz ~ 0.28 18.1870
NoJaybee ~ 1000 Hz ~ 0.8 30.7095
NoJudilyn ~ 1600 Hz ~ 0.8 15.3298
Table 2: Data analysis of No
SCREENSHOTS OF THE DATA IN TABLE 1 & TABLE 2
YesMine: YesBaruc:
YesAyn:
YesEra:
YesGeorge:
YesSoheib:
YesJaybee:
YesJudilyn:
NoMine:
NoBaruc:
NoAyn:
NoEra:
NoGeorge:
NoSoheib:
NoJaybee:
NoJudilyn:
HOW TO RUN THE PROGRAM?
Step 1: Run the YesOrNoRecorder m-file. "YesOrNoRecorder"
Step 2: Run the Yes_Or_No_Project function file with a sampling frequency of 44100 Hz which is twice
the frequency of the speech signal. "Yes_Or_No_Project(x,44100)"
Watch the video here: http://www.youtube.com/watch?v=EOcp7pxQOBA&feature=youtu.be
LIMITATIONS
There was no external microphone that was used. I used the built in microphone in my laptop.
Before I started the recording, it was make sure that there were no unnecessary loud noises. Using a
filtered microphone would be a good choice.
CONCLUSION
Yes or No Speech recognition is a good example of a system which involves Digital Signal
Processing. It make use of the well-known Fast Fourier Transform (fft). Before I started making the
project, I already found some interesting facts about the two words yes and no. The two words have an
unvoiced consonant and voiced consonant sound respectively. Voiced consonant means that when we
say a certain word there is that vibration in our vocal cords while unvoiced consonant doesn't. Unvoiced
consonants has that larger energy compared to that of Voiced as well. That was proven when the testing
was conducted.
The data that was gathered after the testing shows the difference between the word yes and
no. In the waveform x that was plotted, there were more values in the high frequency in the yes signals
than that of no. It was because of the sound 's'. However the amplitude of the no signals were much
higher compared to that of yes.
Finally, based on the data, the objective was successfully achieved with an excellent 100%
accuracy.

More Related Content

What's hot

Hardware and Software parallelism
Hardware and Software parallelismHardware and Software parallelism
Hardware and Software parallelism
prashantdahake
 
Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminar
Diptimaya Sarangi
 

What's hot (20)

Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition
 
Speech Recognition Using Python | Edureka
Speech Recognition Using Python | EdurekaSpeech Recognition Using Python | Edureka
Speech Recognition Using Python | Edureka
 
Hardware and Software parallelism
Hardware and Software parallelismHardware and Software parallelism
Hardware and Software parallelism
 
Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminar
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speech
 
Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overview
 
Hardware and software parallelism
Hardware and software parallelismHardware and software parallelism
Hardware and software parallelism
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentation
 
Loop optimization
Loop optimizationLoop optimization
Loop optimization
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by Iqbal
 
Language identification
Language identificationLanguage identification
Language identification
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Speech recognition techniques
Speech recognition techniquesSpeech recognition techniques
Speech recognition techniques
 
Unit 4 sp macro
Unit 4 sp macroUnit 4 sp macro
Unit 4 sp macro
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Micro Programmed Control Unit
Micro Programmed Control UnitMicro Programmed Control Unit
Micro Programmed Control Unit
 
Simulation of speech recognition using correlation method on matlab software
Simulation of speech recognition using correlation method on matlab softwareSimulation of speech recognition using correlation method on matlab software
Simulation of speech recognition using correlation method on matlab software
 
Assemblers
AssemblersAssemblers
Assemblers
 

Similar to Speech Recognition No Code

Silent-Sound-Technology-PPT.pptx
Silent-Sound-Technology-PPT.pptxSilent-Sound-Technology-PPT.pptx
Silent-Sound-Technology-PPT.pptx
omkarrekulwar
 
How speech reorganization works
How speech reorganization worksHow speech reorganization works
How speech reorganization works
Muhammad Taqi
 

Similar to Speech Recognition No Code (20)

Voice Recognition System using Template Matching
Voice Recognition System using Template MatchingVoice Recognition System using Template Matching
Voice Recognition System using Template Matching
 
Silent sound technologyrevathippt
Silent sound technologyrevathipptSilent sound technologyrevathippt
Silent sound technologyrevathippt
 
E0ad silent sound technology
E0ad silent  sound technologyE0ad silent  sound technology
E0ad silent sound technology
 
Dy36749754
Dy36749754Dy36749754
Dy36749754
 
Seminar PPT - Shreya Suroliya.pptx
Seminar PPT - Shreya Suroliya.pptxSeminar PPT - Shreya Suroliya.pptx
Seminar PPT - Shreya Suroliya.pptx
 
Silent-Sound-Technology-PPT.pptx
Silent-Sound-Technology-PPT.pptxSilent-Sound-Technology-PPT.pptx
Silent-Sound-Technology-PPT.pptx
 
Silent Sound Technology
Silent Sound TechnologySilent Sound Technology
Silent Sound Technology
 
Speech Analysis
Speech AnalysisSpeech Analysis
Speech Analysis
 
An Introduction to Various Features of Speech SignalSpeech features
An Introduction to Various Features of Speech SignalSpeech featuresAn Introduction to Various Features of Speech SignalSpeech features
An Introduction to Various Features of Speech SignalSpeech features
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
Block codes
Block codesBlock codes
Block codes
 
How speech reorganization works
How speech reorganization worksHow speech reorganization works
How speech reorganization works
 
Silent Sound Technology
Silent Sound TechnologySilent Sound Technology
Silent Sound Technology
 
Esophageal Speech Recognition using Artificial Neural Network (ANN)
Esophageal Speech Recognition using Artificial Neural Network (ANN)Esophageal Speech Recognition using Artificial Neural Network (ANN)
Esophageal Speech Recognition using Artificial Neural Network (ANN)
 
SPEECH ENHANCEMENT USING KERNEL AND NORMALIZED KERNEL AFFINE PROJECTION ALGOR...
SPEECH ENHANCEMENT USING KERNEL AND NORMALIZED KERNEL AFFINE PROJECTION ALGOR...SPEECH ENHANCEMENT USING KERNEL AND NORMALIZED KERNEL AFFINE PROJECTION ALGOR...
SPEECH ENHANCEMENT USING KERNEL AND NORMALIZED KERNEL AFFINE PROJECTION ALGOR...
 
Silent sound technology final report
Silent sound technology final reportSilent sound technology final report
Silent sound technology final report
 
Speech Recognition System
Speech Recognition SystemSpeech Recognition System
Speech Recognition System
 
Robust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat labRobust Speech Recognition Technique using Mat lab
Robust Speech Recognition Technique using Mat lab
 
Research: Applying Various DSP-Related Techniques for Robust Recognition of A...
Research: Applying Various DSP-Related Techniques for Robust Recognition of A...Research: Applying Various DSP-Related Techniques for Robust Recognition of A...
Research: Applying Various DSP-Related Techniques for Robust Recognition of A...
 
Voice recognition
Voice recognitionVoice recognition
Voice recognition
 

Speech Recognition No Code

  • 1. Gerome Jan Llames MEECE - CCO Project in MEE 1231: Digital Signal Processing Yes or No Speech Recognition Objectives: 1. To build a program, involving Digital Signal Processing, which would detect the speech signal if it is a Yes or a No. 2. To know the difference between a yes and a no signal. INTRODUCTION According to Wikipedia, in Computer Science and Electrical Engineering speech recognition (SR) is the translation of spoken words into text. It is also known as "automatic speech recognition" (ASR), "computer speech recognition", or just "speech to text" (STT). Some SR systems use "speaker independent speech recognition" while others use "training" where an individual speaker reads sections of text into the SR system. These systems analyze the person's specific voice and use it to fine tune the recognition of that person's speech, resulting in more accurate transcription. Systems that do not use training are called "speaker independent" systems. Systems that use training are called "speaker dependent" systems. Speech recognition applications include voice user interfaces such as voice dialing (e.g. "Call home"), call routing (e.g. "I would like to make a collect call"), domotic appliance control, search (e.g. find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g. a radiology report), speech-to-text processing (e.g., word processors or emails), and aircraft (usually termed Direct Voice Input). Although speech recognition in general is a very complex problem, even a simple program can distinguish between the two words yes and no. Although any two words could be used for a project like this, yes and no were chosen because there are real systems that do exactly this task. For example, calls to a company and telephone surveys are often handled by automated systems that ask the person questions and attempt to determine the response using speech recognition. In a situation where the question is answered by yes or no, a yes/no speech recognition system is very useful.
  • 2. DIGITAL SIGNAL PROCESSING SYSTEM System Workflow This project runs in real time. It all starts from the speech signal, yes or no, which makes use the built-in microphone of any computer. Then the signal will be processed in MATLAB where fft takes place. Finally, if the value is below the threshold it would display Yes and no if it’s the other way around.
  • 3. DATA ANALYSIS The two table's shows data collected after 8 real time testings. It achieves an impressive 100% accuracy. It shows the voices, frequencies, amplitudes and f values. With these data we can say that there are more signals in the higher frequency in the word yes compared to no while higher amplitude occurs when the word no is spoken. With threshold value at 12, I can say that indeed it has a 100% accuracy. Voice Frequency Amplitude f value YesMine ~ 1400 Hz ~ 0.3 5.9218 YesBaruc ~ 1800 Hz ~ 0.25 1.8685 YesAyn ~ 1600 Hz ~ 0.6 4.2221 YesEra ~ 1400 Hz ~ 1.0 7.2813 YesGeorge ~ 1400 Hz ~ 0.45 5.5708 YesSoheib ~ 600 Hz ~ 0.35 9.6850 YesJaybee ~ 1800 Hz ~ 0.7 3.7257 YesJudilyn ~ 1400 Hz ~ 0.3 5.3822 Table 1: Data analysis of YES Voice Frequency Amplitude f value NoMine ~ 1000 Hz ~ 0.4 21.3383 NoBaruc ~ 1000 Hz ~ 0.4 18.6965 NoAyn ~ 900 Hz ~ 0.7 18.0442 NoEra ~ 1200 Hz ~ 0.4 20.2817 NoGeorge ~ 1000 Hz ~ 0.9 25.5547 NoSoheib ~ 450 Hz ~ 0.28 18.1870 NoJaybee ~ 1000 Hz ~ 0.8 30.7095 NoJudilyn ~ 1600 Hz ~ 0.8 15.3298 Table 2: Data analysis of No
  • 4. SCREENSHOTS OF THE DATA IN TABLE 1 & TABLE 2 YesMine: YesBaruc:
  • 9. HOW TO RUN THE PROGRAM? Step 1: Run the YesOrNoRecorder m-file. "YesOrNoRecorder" Step 2: Run the Yes_Or_No_Project function file with a sampling frequency of 44100 Hz which is twice the frequency of the speech signal. "Yes_Or_No_Project(x,44100)" Watch the video here: http://www.youtube.com/watch?v=EOcp7pxQOBA&feature=youtu.be LIMITATIONS There was no external microphone that was used. I used the built in microphone in my laptop. Before I started the recording, it was make sure that there were no unnecessary loud noises. Using a filtered microphone would be a good choice. CONCLUSION Yes or No Speech recognition is a good example of a system which involves Digital Signal Processing. It make use of the well-known Fast Fourier Transform (fft). Before I started making the project, I already found some interesting facts about the two words yes and no. The two words have an unvoiced consonant and voiced consonant sound respectively. Voiced consonant means that when we say a certain word there is that vibration in our vocal cords while unvoiced consonant doesn't. Unvoiced consonants has that larger energy compared to that of Voiced as well. That was proven when the testing was conducted. The data that was gathered after the testing shows the difference between the word yes and no. In the waveform x that was plotted, there were more values in the high frequency in the yes signals than that of no. It was because of the sound 's'. However the amplitude of the no signals were much higher compared to that of yes. Finally, based on the data, the objective was successfully achieved with an excellent 100% accuracy.