SlideShare a Scribd company logo
1 of 16
Download to read offline
An Introduction to Speech
                Recognition


                     Advance Electronic Devices
                     EC - 410




Instructor:          By:
Dr. M Ravibabu       Mayank Awasthi (2006033)
Topics to be covered

   Overview
   Speech Production
   SR system
   Why Speech Recognition is difficult
   Current Software Options for PC
   Applications
   References
Overview
   Speech is the vocalized form of human communication.
   Each spoken word is created out of the phonetic combination of
    a limited set of vowel and consonant speech sound units.

   Speech recognition is the ability of a machine or program
    to identify words and phrases in spoken language and
    convert them to a machine-readable format.

   Speech recognition has evolved quite a bit over the past few
    years. Initially, it used to work in discrete dictation mode, where
    you had to pause between each spoken word. Today, however,
    it uses continuous dictation. It’s also become smarter, with its
    own set of grammar rules to make out the meaning of what’s
    being said.
Speech Production

   Normal human speech is produced with pulmonary pressure
    provided by the lungs which creates phonation in the glottis in
    the laryngeal prominence that then is modified by the vocal
    tract into different vowels and consonants.
   Knowledge of generation of various speech sounds help us to
    understand the properties of speech sounds.
   In short we can say that sound is generated when vocal tract is
    excited.
   The mode of excitation can be of 3 type:
    1). Periodic -------------- in case of vowels
    2). Aperiodic ------------ in case of consonants
    3). Mixed
Contd.
   In case of voiced sound as vowels, the excitation is periodic.
    The periodic opening & closing of glottis results in puffs of air
    exciting vocal tract.
   If we assume that 340m/s as the speed of sound in air and
    17cm as the length of vocal tract from glottis to lips, the
    fundamental frequency of resonance can be calculated as
                          v=c / w = 34000 / 4*17= 500hz

   The frequencies of the harmonic would be 1500hz, 2500hz etc.
    Thus we should expect peaks in the frequency spectrum of the
    vowel at these frequencies.
   These peaks in the spectrum, due to resonance in the
    vocal tract is called Formants
   Different speech sources are generated by changing the
    resonant cavity resulting in the different value of frequency,
    amplitude and bandwidth of formants.
Contd.

    Source excitation     Time varying filter       Output Speech wave
                        representing vocal tract
                Source Filter Model of speech production


•    We know that s(n)= e(n)*h(n)
•    Figure shows that typical spectra of two speech sounds of the
     hindi word “ki” on log scale. Red one for ‘/i/’ and black one for ‘/k/’
Contd.
   Speech sounds are characterized by the size and shape of filter (vocal
    cavity) which is represented by the spectrum of the filter H(k).
    Therefore, the source characteristic such as fundamental frequency,
    signal amplitude etc. can be ignored in speech recognition.
   The log power spectrum of the is the sum of the log power spectrum of
    source and filter.Since the power spectrum of source is varying rapidly
    with frequency whereas the filter varies slowly. Therefore if we pass
    this composite log power spectrum through a low pass, only the
    characteristic of the filter remains.
   This process is called Liftering & can be achieved by just taking the
    inverse fourier transform of log power spectrum and retaining first few
    components. The resulting spectrum is called cepstrum and
    coefficient is called cepstral coefficients.
                  cep(q)= IFFT{ log(|S(k)|2)}   q=0,1,2,……N-1.
   Most of the SR system use cepstral coefficients and their time
    derivatives as feature for representing speech sounds
Speech Recognition System
Contd.

   First, the user gives a voice command over the microphone, which is
    passed to the sound card in your system. This analog signal is
    sampled converted into digital form using a technique called Pulse
    Code Modulation or PCM. This digital waveform is a stream of
    amplitudes that look like a wavy line.

   The audio signal is further sampled and each sample is converted into
    a frequency domain. So, the incoming stream is now a set of discrete
    frequency bands, in a form that can be used by the speech recognizer.
   The next stage involves recognizing these bands of frequencies.
    For this, the speech recognition software has a database containing
    thousands of frequencies or "phonemes", as they’re called.
Contd.

   A phoneme is the smallest unit of speech in a language. The
    utterance (vocalization) of one phoneme is different from another, such
    that if one phoneme replaces another in a word, the word would have
    a different meaning. For example, if the "b" in "bat" were replaced by
    the phoneme "r", the meaning would change to "rat".
   Ex: Kit vs Skill. /k/ is aspirated in first case & not in second case.

   The phoneme database is used to match the audio frequency
    bands that were sampled. So, for example, if the incoming frequency
    sounds like a "t", the software will try and match it to the corresponding
    phoneme in the database. Each phoneme is tagged with a feature
    number, which is then assigned to the incoming signal.
Why SR is difficult?

   A given word is spoken by different persons, different persons
    have different spectral properties. Ex- Female had shorter vocal
    tract than male. So the formant frequency spoken by female is
    higher than that of spoken by male.
   The properties of the sound not only depend on the identity of
    the corresponding phoneme but also on the neighbouring
    sound. Ex- a speaker has mispronounced the long word
    “Thiruvananthapuran” as “tiruvanthpuram. Human being don’t
    have any problem in translating it to correct word.
    However such case pose a problem for machine.
Current Software Options for PC

   Dragon Systems – Naturally Speaking
   Philips – FreeSpeech
   IBM – ViaVoice
   Lernout & Hauspie – Voice Xpress
Applications

   Military: On particular note are the U.S. programs in speech
    recognition for the Advanced Fighter Technology Integration
    (AFTI)/F-16 aircraft (F-16 VISTA), the program in France on
    installing speech recognition systems on Mirage aircraft.In
    these programs, speech recognizers have been operated
    successfully in fighter aircraft with applications including: setting
    radio frequencies, commanding an autopilot system, setting
    steer-point coordinates and weapons release parameters, and
    controlling flight displays.
   Person with disabilities
   Telephony and other domains
References

   www.google.com
   www.wikipedia.org
   www.esnips.com

More Related Content

What's hot

Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminarDiptimaya Sarangi
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech RecognitionHugo Moreno
 
speech processing and recognition basic in data mining
speech processing and recognition basic in  data miningspeech processing and recognition basic in  data mining
speech processing and recognition basic in data miningJimit Rupani
 
Unit 1 speech processing
Unit 1 speech processingUnit 1 speech processing
Unit 1 speech processingazhagujaisudhan
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognitionananth
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice RecognitionAmrita More
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognitionRichie
 
SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK Kamonasish Hore
 
Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overviewVarun Jain
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Rajnish Raj
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizyLizy Abraham
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speechBilgin Aksoy
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech RecognitionAhmed Moawad
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologyAamir-sheriff
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologySrijanKumar18
 

What's hot (20)

Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminar
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
speech processing and recognition basic in data mining
speech processing and recognition basic in  data miningspeech processing and recognition basic in  data mining
speech processing and recognition basic in data mining
 
Unit 1 speech processing
Unit 1 speech processingUnit 1 speech processing
Unit 1 speech processing
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 
Speech processing
Speech processingSpeech processing
Speech processing
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Speaker recognition.
Speaker recognition.Speaker recognition.
Speaker recognition.
 
SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK
 
Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overview
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizy
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speech
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Speaker Recognition
Speaker RecognitionSpeaker Recognition
Speaker Recognition
 

Viewers also liked

Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By MatlabAnkit Gujrati
 
Artificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemArtificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemREHMAT ULLAH
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by IqbalIqbal
 
Speech recognition project report
Speech recognition project reportSpeech recognition project report
Speech recognition project reportSarang Afle
 
Dev Days, Speech Recognition, LM Aubert
Dev Days, Speech Recognition, LM AubertDev Days, Speech Recognition, LM Aubert
Dev Days, Speech Recognition, LM Aubertaubertlm
 
Speech recognition An overview
Speech recognition An overviewSpeech recognition An overview
Speech recognition An overviewsajanazoya
 
Blackboard architecture pattern
Blackboard architecture patternBlackboard architecture pattern
Blackboard architecture patternaish006
 
Blackboard Pattern
Blackboard PatternBlackboard Pattern
Blackboard Patterntcab22
 
Rajul computer presentation
Rajul computer presentationRajul computer presentation
Rajul computer presentationNeetu Jain
 
The Main Concepts of Speech Recognition
The Main Concepts of Speech RecognitionThe Main Concepts of Speech Recognition
The Main Concepts of Speech Recognition子毅 楊
 
Speech recognition challenges
Speech recognition challengesSpeech recognition challenges
Speech recognition challengesAlexandru Chica
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Chiranjeevi Adi
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognitionboddu syamprasad
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition systemAlok Tiwari
 
Ai based character recognition and speech synthesis
Ai based character recognition and speech  synthesisAi based character recognition and speech  synthesis
Ai based character recognition and speech synthesisAnkita Jadhao
 
AUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYAUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYIJCERT
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognitionManthan Gandhi
 

Viewers also liked (20)

Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
Artificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemArtificial intelligence Speech recognition system
Artificial intelligence Speech recognition system
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by Iqbal
 
Speech recognition project report
Speech recognition project reportSpeech recognition project report
Speech recognition project report
 
Dev Days, Speech Recognition, LM Aubert
Dev Days, Speech Recognition, LM AubertDev Days, Speech Recognition, LM Aubert
Dev Days, Speech Recognition, LM Aubert
 
Speech recognition An overview
Speech recognition An overviewSpeech recognition An overview
Speech recognition An overview
 
Blackboard architecture pattern
Blackboard architecture patternBlackboard architecture pattern
Blackboard architecture pattern
 
fundamentals of speech recognition
fundamentals of speech recognitionfundamentals of speech recognition
fundamentals of speech recognition
 
Blackboard Pattern
Blackboard PatternBlackboard Pattern
Blackboard Pattern
 
Rajul computer presentation
Rajul computer presentationRajul computer presentation
Rajul computer presentation
 
The Main Concepts of Speech Recognition
The Main Concepts of Speech RecognitionThe Main Concepts of Speech Recognition
The Main Concepts of Speech Recognition
 
Speech recognition challenges
Speech recognition challengesSpeech recognition challenges
Speech recognition challenges
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
 
Ai based character recognition and speech synthesis
Ai based character recognition and speech  synthesisAi based character recognition and speech  synthesis
Ai based character recognition and speech synthesis
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
AUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYAUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEY
 
Iris Scan
Iris ScanIris Scan
Iris Scan
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 

Similar to An Introduction To Speech Recognition

Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01Rehan Ahmed
 
Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderIJTET Journal
 
Speech and Language Processing
Speech and Language ProcessingSpeech and Language Processing
Speech and Language ProcessingVikalp Mahendra
 
How speech reorganization works
How speech reorganization worksHow speech reorganization works
How speech reorganization worksMuhammad Taqi
 
Homomorphic speech processing
Homomorphic speech processingHomomorphic speech processing
Homomorphic speech processingsivakumar m
 
Linear predictive coding documentation
Linear predictive coding  documentationLinear predictive coding  documentation
Linear predictive coding documentationchakravarthy Gopi
 
Voicemorphingppt 110328163403-phpapp01
Voicemorphingppt 110328163403-phpapp01Voicemorphingppt 110328163403-phpapp01
Voicemorphingppt 110328163403-phpapp01Madhu Babu
 
An Introduction to Various Features of Speech SignalSpeech features
An Introduction to Various Features of Speech SignalSpeech featuresAn Introduction to Various Features of Speech SignalSpeech features
An Introduction to Various Features of Speech SignalSpeech featuresSivaranjan Goswami
 
Coms30123 Synthesis 3 Projector
Coms30123 Synthesis 3 ProjectorComs30123 Synthesis 3 Projector
Coms30123 Synthesis 3 ProjectorDr. Cupid Lucid
 
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...cscpconf
 
Speech To Sign Language Interpreter System
Speech To Sign Language Interpreter SystemSpeech To Sign Language Interpreter System
Speech To Sign Language Interpreter Systemkkkseld
 
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications IDSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications IAmr E. Mohamed
 
DDSP_2018_FOEHU - Lec 10 - Digital Signal Processing Applications
DDSP_2018_FOEHU - Lec 10 - Digital Signal Processing ApplicationsDDSP_2018_FOEHU - Lec 10 - Digital Signal Processing Applications
DDSP_2018_FOEHU - Lec 10 - Digital Signal Processing ApplicationsAmr E. Mohamed
 
High Quality Arabic Concatenative Speech Synthesis
High Quality Arabic Concatenative Speech SynthesisHigh Quality Arabic Concatenative Speech Synthesis
High Quality Arabic Concatenative Speech Synthesissipij
 
Voice Recognition System using Template Matching
Voice Recognition System using Template MatchingVoice Recognition System using Template Matching
Voice Recognition System using Template MatchingIJORCS
 

Similar to An Introduction To Speech Recognition (20)

Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01Voice morphing-101113123852-phpapp01
Voice morphing-101113123852-phpapp01
 
Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using Vocoder
 
Speech and Language Processing
Speech and Language ProcessingSpeech and Language Processing
Speech and Language Processing
 
How speech reorganization works
How speech reorganization worksHow speech reorganization works
How speech reorganization works
 
Homomorphic speech processing
Homomorphic speech processingHomomorphic speech processing
Homomorphic speech processing
 
Linear predictive coding documentation
Linear predictive coding  documentationLinear predictive coding  documentation
Linear predictive coding documentation
 
Voicemorphingppt 110328163403-phpapp01
Voicemorphingppt 110328163403-phpapp01Voicemorphingppt 110328163403-phpapp01
Voicemorphingppt 110328163403-phpapp01
 
50120140501002
5012014050100250120140501002
50120140501002
 
Animal Voice Morphing System
Animal Voice Morphing SystemAnimal Voice Morphing System
Animal Voice Morphing System
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive Coding
 
An Introduction to Various Features of Speech SignalSpeech features
An Introduction to Various Features of Speech SignalSpeech featuresAn Introduction to Various Features of Speech SignalSpeech features
An Introduction to Various Features of Speech SignalSpeech features
 
Coms30123 Synthesis 3 Projector
Coms30123 Synthesis 3 ProjectorComs30123 Synthesis 3 Projector
Coms30123 Synthesis 3 Projector
 
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
ANALYSIS OF SPEECH UNDER STRESS USING LINEAR TECHNIQUES AND NON-LINEAR TECHNI...
 
Speech To Sign Language Interpreter System
Speech To Sign Language Interpreter SystemSpeech To Sign Language Interpreter System
Speech To Sign Language Interpreter System
 
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications IDSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
 
Isolated English Word Recognition System: Appropriate for Bengali-accented En...
Isolated English Word Recognition System: Appropriate for Bengali-accented En...Isolated English Word Recognition System: Appropriate for Bengali-accented En...
Isolated English Word Recognition System: Appropriate for Bengali-accented En...
 
DDSP_2018_FOEHU - Lec 10 - Digital Signal Processing Applications
DDSP_2018_FOEHU - Lec 10 - Digital Signal Processing ApplicationsDDSP_2018_FOEHU - Lec 10 - Digital Signal Processing Applications
DDSP_2018_FOEHU - Lec 10 - Digital Signal Processing Applications
 
High Quality Arabic Concatenative Speech Synthesis
High Quality Arabic Concatenative Speech SynthesisHigh Quality Arabic Concatenative Speech Synthesis
High Quality Arabic Concatenative Speech Synthesis
 
Voice Recognition System using Template Matching
Voice Recognition System using Template MatchingVoice Recognition System using Template Matching
Voice Recognition System using Template Matching
 
G010424248
G010424248G010424248
G010424248
 

More from Department of Telecommunications, Ministry of Communication & IT (INDIA) (9)

Workshop proposal
Workshop proposalWorkshop proposal
Workshop proposal
 
Feeding system for disabled report
Feeding system for disabled reportFeeding system for disabled report
Feeding system for disabled report
 
Numerical methods generating polynomial
Numerical methods generating polynomialNumerical methods generating polynomial
Numerical methods generating polynomial
 
Design ideas mayank
Design ideas mayankDesign ideas mayank
Design ideas mayank
 
Adaptive signal processing simon haykins
Adaptive signal processing simon haykinsAdaptive signal processing simon haykins
Adaptive signal processing simon haykins
 
Pdpm,mayank awasthi,jabalpur,i it kanpur, servo motor,keil code
Pdpm,mayank awasthi,jabalpur,i it kanpur, servo motor,keil codePdpm,mayank awasthi,jabalpur,i it kanpur, servo motor,keil code
Pdpm,mayank awasthi,jabalpur,i it kanpur, servo motor,keil code
 
Analogic
AnalogicAnalogic
Analogic
 
jacobi method, gauss siedel for solving linear equations
jacobi method, gauss siedel for solving linear equationsjacobi method, gauss siedel for solving linear equations
jacobi method, gauss siedel for solving linear equations
 
Numerical Methods Solving Linear Equations
Numerical Methods Solving Linear EquationsNumerical Methods Solving Linear Equations
Numerical Methods Solving Linear Equations
 

An Introduction To Speech Recognition

  • 1. An Introduction to Speech Recognition Advance Electronic Devices EC - 410 Instructor: By: Dr. M Ravibabu Mayank Awasthi (2006033)
  • 2. Topics to be covered  Overview  Speech Production  SR system  Why Speech Recognition is difficult  Current Software Options for PC  Applications  References
  • 3. Overview  Speech is the vocalized form of human communication.  Each spoken word is created out of the phonetic combination of a limited set of vowel and consonant speech sound units.  Speech recognition is the ability of a machine or program to identify words and phrases in spoken language and convert them to a machine-readable format.  Speech recognition has evolved quite a bit over the past few years. Initially, it used to work in discrete dictation mode, where you had to pause between each spoken word. Today, however, it uses continuous dictation. It’s also become smarter, with its own set of grammar rules to make out the meaning of what’s being said.
  • 4. Speech Production  Normal human speech is produced with pulmonary pressure provided by the lungs which creates phonation in the glottis in the laryngeal prominence that then is modified by the vocal tract into different vowels and consonants.  Knowledge of generation of various speech sounds help us to understand the properties of speech sounds.  In short we can say that sound is generated when vocal tract is excited.  The mode of excitation can be of 3 type: 1). Periodic -------------- in case of vowels 2). Aperiodic ------------ in case of consonants 3). Mixed
  • 5.
  • 6. Contd.  In case of voiced sound as vowels, the excitation is periodic. The periodic opening & closing of glottis results in puffs of air exciting vocal tract.  If we assume that 340m/s as the speed of sound in air and 17cm as the length of vocal tract from glottis to lips, the fundamental frequency of resonance can be calculated as v=c / w = 34000 / 4*17= 500hz  The frequencies of the harmonic would be 1500hz, 2500hz etc. Thus we should expect peaks in the frequency spectrum of the vowel at these frequencies.  These peaks in the spectrum, due to resonance in the vocal tract is called Formants  Different speech sources are generated by changing the resonant cavity resulting in the different value of frequency, amplitude and bandwidth of formants.
  • 7. Contd. Source excitation Time varying filter Output Speech wave representing vocal tract Source Filter Model of speech production • We know that s(n)= e(n)*h(n) • Figure shows that typical spectra of two speech sounds of the hindi word “ki” on log scale. Red one for ‘/i/’ and black one for ‘/k/’
  • 8.
  • 9. Contd.  Speech sounds are characterized by the size and shape of filter (vocal cavity) which is represented by the spectrum of the filter H(k). Therefore, the source characteristic such as fundamental frequency, signal amplitude etc. can be ignored in speech recognition.  The log power spectrum of the is the sum of the log power spectrum of source and filter.Since the power spectrum of source is varying rapidly with frequency whereas the filter varies slowly. Therefore if we pass this composite log power spectrum through a low pass, only the characteristic of the filter remains.  This process is called Liftering & can be achieved by just taking the inverse fourier transform of log power spectrum and retaining first few components. The resulting spectrum is called cepstrum and coefficient is called cepstral coefficients. cep(q)= IFFT{ log(|S(k)|2)} q=0,1,2,……N-1.  Most of the SR system use cepstral coefficients and their time derivatives as feature for representing speech sounds
  • 11. Contd.  First, the user gives a voice command over the microphone, which is passed to the sound card in your system. This analog signal is sampled converted into digital form using a technique called Pulse Code Modulation or PCM. This digital waveform is a stream of amplitudes that look like a wavy line.  The audio signal is further sampled and each sample is converted into a frequency domain. So, the incoming stream is now a set of discrete frequency bands, in a form that can be used by the speech recognizer.  The next stage involves recognizing these bands of frequencies. For this, the speech recognition software has a database containing thousands of frequencies or "phonemes", as they’re called.
  • 12. Contd.  A phoneme is the smallest unit of speech in a language. The utterance (vocalization) of one phoneme is different from another, such that if one phoneme replaces another in a word, the word would have a different meaning. For example, if the "b" in "bat" were replaced by the phoneme "r", the meaning would change to "rat".  Ex: Kit vs Skill. /k/ is aspirated in first case & not in second case.  The phoneme database is used to match the audio frequency bands that were sampled. So, for example, if the incoming frequency sounds like a "t", the software will try and match it to the corresponding phoneme in the database. Each phoneme is tagged with a feature number, which is then assigned to the incoming signal.
  • 13. Why SR is difficult?  A given word is spoken by different persons, different persons have different spectral properties. Ex- Female had shorter vocal tract than male. So the formant frequency spoken by female is higher than that of spoken by male.  The properties of the sound not only depend on the identity of the corresponding phoneme but also on the neighbouring sound. Ex- a speaker has mispronounced the long word “Thiruvananthapuran” as “tiruvanthpuram. Human being don’t have any problem in translating it to correct word. However such case pose a problem for machine.
  • 14. Current Software Options for PC  Dragon Systems – Naturally Speaking  Philips – FreeSpeech  IBM – ViaVoice  Lernout & Hauspie – Voice Xpress
  • 15. Applications  Military: On particular note are the U.S. programs in speech recognition for the Advanced Fighter Technology Integration (AFTI)/F-16 aircraft (F-16 VISTA), the program in France on installing speech recognition systems on Mirage aircraft.In these programs, speech recognizers have been operated successfully in fighter aircraft with applications including: setting radio frequencies, commanding an autopilot system, setting steer-point coordinates and weapons release parameters, and controlling flight displays.  Person with disabilities  Telephony and other domains
  • 16. References  www.google.com  www.wikipedia.org  www.esnips.com