Voice Recognition
Using MatLab
Presented by: Avienash raibole
Paresh Meshram
Vinayak kolpek
 INTRODUCTION
• The purpose of our project is to implement an
efficient voice recognition algorithm using
MatLab.
• Voice recognition is the process of converting
an acoustic signal, captured by a microphone
or a telephone, to a set of words.
• The recognised words can be an end in
themselves, as for applications such as
commands & control, data entry, and
document preparation.
• They can also serve as the input to further
linguistic processing in order to achieve
speech understanding.
 What we can do with voice
Recognition
• Transcription
– dictation, information retrieval
• Command and control
– data entry, device control, navigation, call routing
• Information access
– airline schedules, stock quotes, directory
assistance
• Problem solving
– travel planning, logistics
 PRINCIPLE
 Speaker Recognition methods
Text Dependent :
For speaker identity is based on his/ her
speaking one or more specific phase.
Text Independent:
Speaker models capture characteristics of
somebody’s speech which show up
irrespective of what one is saying.
 BLOCK DIAGRAM
Frame
Blocking Windowing FFT
Cepstrum
Mel-frequency
wrapping
Continuous
speech
frame
spectrum
mel
cepstrum
 RECOGNITION MODELS
 Feature Extraction
• That extracts a small amount of data from the
voice signal that can later be used to
represent each speaker.
• A wide range of possibilities exist for
parametrically representing the speech signal
for the speaker recognition task, such as
a)Linear Prediction Coding(LPC),
b)Mel-Frequency Cepstrum Coefficients
(MFCC), and others.
 MFCC
• It is based on the known variation of the
human ear’s critical bandwidths with
frequency, filters spaced linearly at low
frequencies and logarithmically at high
frequencies.
• To capture the phonetically important
characteristics of speech, signal is expressed in
the Mel frequency scale .
 SIMPLE REPRESENTATION OF MFCC
 CALCULATION OF MFCC
 How does it work?
record extract
a voice feature
vectors
• Record voice command (Time domain).
• Transform into frequency domain using
Fourier Transform and get the magnitude
spectrum.
• Compare spectrum of voice commands.
Digitized
Speech
Signal
(.wave
file)
Acoustic
Preprocessing
(DFT + MFCC)
Speech
Recognizer
(Dynamic Time
Warping)
 Applications
• Controlling of device.
• Hands-free mobile phone in car.
• Single purpose command and control system.
• Voice Verification.
• Many more.
 Advantages
• The model is trained much faster than other
method.
• It is able to reduce large datasets to a smaller
number of codebook vectors.
• Easy to implementation and more accurate.
• Speech is a very natural way to interact, and it
is not necessary to sit at a keyboard or work
with a remote control.
• No training required for users.
 Limitations
• The amount of words that could be recognized
by our program was limited, the more words
we tried adding, the less accurate it became.
• The voice recognition program only works for
the person’s voice who is trained for it.
• Program is less accurate in noisy
environments.
• Voice Recognition works best if the
microphone is close to the user.
 Future Of Voice Recognition
• Better rejection of extraneous speech.
• Better recognition of embedded commands.
• Better efficiency on low cost processors.
• Standards for performance evaluation.
• Increased portability.
• Lower error rates.
• Improve overall robustness.
 Research Articles on Speech
Recognition
• Koester, H.H. (2006). Factors that Influence the Performance
of Experienced Speech Recognition Users. Assistive
Technology, 18(1): 56-76.
• Koester, H.H. (2004). Usage, Performance, and Satisfaction
Outcomes for Experienced Users of Speech Recognition.
Journal of Rehabilitation Research and Development, 41(5):
739-754.
• Koester, H.H. (2003). Abandonment of Speech Recognition
Systems. by New Users. Proceedings of RESNA 2003 Annual
Conference, Atlanta, GA. Arlington, VA: RESNA Press.
• Koester, H.H. (2002). User Performance with Speech
Recognition Systems: A Literature Review. Assistive
Technology, 13(2):116-30.
THANK
YOU…

Voice recognition system

  • 1.
    Voice Recognition Using MatLab Presentedby: Avienash raibole Paresh Meshram Vinayak kolpek
  • 2.
     INTRODUCTION • Thepurpose of our project is to implement an efficient voice recognition algorithm using MatLab. • Voice recognition is the process of converting an acoustic signal, captured by a microphone or a telephone, to a set of words. • The recognised words can be an end in themselves, as for applications such as commands & control, data entry, and document preparation. • They can also serve as the input to further linguistic processing in order to achieve speech understanding.
  • 3.
     What wecan do with voice Recognition • Transcription – dictation, information retrieval • Command and control – data entry, device control, navigation, call routing • Information access – airline schedules, stock quotes, directory assistance • Problem solving – travel planning, logistics
  • 4.
  • 5.
     Speaker Recognitionmethods Text Dependent : For speaker identity is based on his/ her speaking one or more specific phase. Text Independent: Speaker models capture characteristics of somebody’s speech which show up irrespective of what one is saying.
  • 6.
     BLOCK DIAGRAM Frame BlockingWindowing FFT Cepstrum Mel-frequency wrapping Continuous speech frame spectrum mel cepstrum
  • 7.
  • 8.
     Feature Extraction •That extracts a small amount of data from the voice signal that can later be used to represent each speaker. • A wide range of possibilities exist for parametrically representing the speech signal for the speaker recognition task, such as a)Linear Prediction Coding(LPC), b)Mel-Frequency Cepstrum Coefficients (MFCC), and others.
  • 9.
     MFCC • Itis based on the known variation of the human ear’s critical bandwidths with frequency, filters spaced linearly at low frequencies and logarithmically at high frequencies. • To capture the phonetically important characteristics of speech, signal is expressed in the Mel frequency scale .
  • 10.
  • 11.
  • 12.
     How doesit work? record extract a voice feature vectors • Record voice command (Time domain). • Transform into frequency domain using Fourier Transform and get the magnitude spectrum. • Compare spectrum of voice commands. Digitized Speech Signal (.wave file) Acoustic Preprocessing (DFT + MFCC) Speech Recognizer (Dynamic Time Warping)
  • 13.
     Applications • Controllingof device. • Hands-free mobile phone in car. • Single purpose command and control system. • Voice Verification. • Many more.
  • 14.
     Advantages • Themodel is trained much faster than other method. • It is able to reduce large datasets to a smaller number of codebook vectors. • Easy to implementation and more accurate. • Speech is a very natural way to interact, and it is not necessary to sit at a keyboard or work with a remote control. • No training required for users.
  • 15.
     Limitations • Theamount of words that could be recognized by our program was limited, the more words we tried adding, the less accurate it became. • The voice recognition program only works for the person’s voice who is trained for it. • Program is less accurate in noisy environments. • Voice Recognition works best if the microphone is close to the user.
  • 16.
     Future OfVoice Recognition • Better rejection of extraneous speech. • Better recognition of embedded commands. • Better efficiency on low cost processors. • Standards for performance evaluation. • Increased portability. • Lower error rates. • Improve overall robustness.
  • 17.
     Research Articleson Speech Recognition • Koester, H.H. (2006). Factors that Influence the Performance of Experienced Speech Recognition Users. Assistive Technology, 18(1): 56-76. • Koester, H.H. (2004). Usage, Performance, and Satisfaction Outcomes for Experienced Users of Speech Recognition. Journal of Rehabilitation Research and Development, 41(5): 739-754. • Koester, H.H. (2003). Abandonment of Speech Recognition Systems. by New Users. Proceedings of RESNA 2003 Annual Conference, Atlanta, GA. Arlington, VA: RESNA Press. • Koester, H.H. (2002). User Performance with Speech Recognition Systems: A Literature Review. Assistive Technology, 13(2):116-30.
  • 18.