Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
SPEECH RECOGNITION
SYSTEMS

TWINKLE SAHU
CSE 6TH SEM
INTRODUCTION
• Speech recognition is a process by which a computer
takes a speech signal (recorded using a microphone)
and...
DESIGN OF A SR
SYSTEM
SR systems have to deal with a large number of challenges
like :• The speaker’s voice is often accom...
TYPES OF SR SYSTEMS
• Speaker Dependent SR systems : Work by learning
the unique characteristics of a single person’s voic...
BASIC PRINCIPLES OF
SPEECH RECOGNITION
• The smallest unit of spoken language is known as a
Phoneme.
• The English languag...
• To interpret speech we must have a way of
identifying the components of spoken words and
phonemes act as identifying mar...
COMPONENTS OF SPEECH
RECOGNITION
• Corpus Collection :
Database consisting of speech data that built from
multiple speech ...
• Corpus collection construction for a speakerdependent SR system :-
• Corpus collection construction for a speakerindependent SR system.
• Signal Analyzer :
Analyses the speech signal
and removes the background
noise thus focusing only on the
speaker’s speech...
• Language Model : Identifies words and thus
sentences uttered by the speaker from the
phonemes by making use of a diction...
PROCESS OF SPEECH
RECOGNITION
PAIN……
……

SPEECH
ANALYZER
SPEECH ANALYZER

/p/--/ae/--/n/
ACOUSTIC MODEL

/p/--/ae/--/n/

CORRECT
/p/--/ae/--/n/

TRAINED HIDDEN
MARKOV MODEL
LANGUAGE MODEL
/p/--/ae/--/n/

DICTIONARY FILE

pain

pain

GRAMMAR FILE
pain
TEXT OUTPUT
The Grammar File
HIDDEN MARKOV MODEL
• Markov models are excellent ways of abstracting
simple concepts into a relatively easily computable
...
N1 N2 N3

= 0.4 X 0.8 X 0.5 = 0.16

N1 N2 N2 N2 N3 N3 N3 N3 N3 = 0.4 x 0.2 x 0.2 x 0.8 x
0.5 x 0.5 x 0.5 x 0.5
= 0.0008
N1...
This accommodates for pronunciations such as:
t ow m aa t ow - British English
t ah m ey t ow - American English
t ah mey ...
With sentences such as:
I like apple juice
I like tomato juice
I hate apple juice
I hate tomato juice

- Very probable
- V...
• The Markov Model makes the Speech Recognition
systems more intelligent i.e. it can accurately
differentiate between simi...
PERFORMANCE OF A SR
SYSTEM
• Accuracy is usually rated with word error rate (WER),
whereas speed is measured with the real...
Factors affecting the accuracy of a SR system :•
•
•
•
•
•

Vocabulary size and confusability
Speaker dependence vs. indep...
APPLICATIONS
• Health Care
• Military - High Performance Aircrafts
- Air Traffic Control Systems

• Telephony – Smart-phon...
SIRI AND GOOGLE
NOW

Intelligent Personal Assistant
developed by Apple.

Google Now is an intelligent
personal assistant d...
CONCLUSION
• Speech Recognition systems are an indispensable
part of the ever-advancing field of humancomputer interaction...
Upcoming SlideShare
Loading in …5
×

Speech recognition system seminar

12,360 views

Published on

  • Login to see the comments

Speech recognition system seminar

  1. 1. SPEECH RECOGNITION SYSTEMS TWINKLE SAHU CSE 6TH SEM
  2. 2. INTRODUCTION • Speech recognition is a process by which a computer takes a speech signal (recorded using a microphone) and converts it into words in real-time. It is achieved by following certain steps and the software responsible for it is known as a ‘Speech Recognition System’ • SR systems are usually implemented in the form of dictation software and intelligent assistants in personal computers, smartphones, web browsers and many other devices.
  3. 3. DESIGN OF A SR SYSTEM SR systems have to deal with a large number of challenges like :• The speaker’s voice is often accompanied by surrounding noise which makes their accurate recognition difficult. • A speaker may speak a number of different words and all of these words have to be accurately recognized. • Accent of speaking varies from person to person and this is a very big challenge • A speaker may speak something very quickly and all of the words spoken have to be individually recognized accurately.
  4. 4. TYPES OF SR SYSTEMS • Speaker Dependent SR systems : Work by learning the unique characteristics of a single person’s voice and depend on the speaker for training. • Speaker Independent SR systems : Designed to recognize anyone’s voice, so no training is involved.
  5. 5. BASIC PRINCIPLES OF SPEECH RECOGNITION • The smallest unit of spoken language is known as a Phoneme. • The English language contains approximately 44 phonemes representing all the vowels and consonants that we use for speech. • We can take the example of a typical word such as moon which can be broken down into three phonemes: m, ue, n.
  6. 6. • To interpret speech we must have a way of identifying the components of spoken words and phonemes act as identifying markers within speech. • An algorithm has to be used to interpret the speech further. The Hidden Markov Model is a commonly used mathematical model used to do this. • To create a speech recognition engine, a large database of models is created to match each phoneme. • When a comparison is performed, the most likely match is determined between the spoken phoneme and the stored one, and further computations are performed.
  7. 7. COMPONENTS OF SPEECH RECOGNITION • Corpus Collection : Database consisting of speech data that built from multiple speech samples.
  8. 8. • Corpus collection construction for a speakerdependent SR system :-
  9. 9. • Corpus collection construction for a speakerindependent SR system.
  10. 10. • Signal Analyzer : Analyses the speech signal and removes the background noise thus focusing only on the speaker’s speech . • Acoustic Model : Identifies phonemes from the speech sample using a probability based mathematical model. ACOUSTIC MODEL
  11. 11. • Language Model : Identifies words and thus sentences uttered by the speaker from the phonemes by making use of a dictionary file and grammar file. DICTIONARY FILE GRAMMAR FILE
  12. 12. PROCESS OF SPEECH RECOGNITION PAIN…… …… SPEECH ANALYZER
  13. 13. SPEECH ANALYZER /p/--/ae/--/n/
  14. 14. ACOUSTIC MODEL /p/--/ae/--/n/ CORRECT /p/--/ae/--/n/ TRAINED HIDDEN MARKOV MODEL
  15. 15. LANGUAGE MODEL /p/--/ae/--/n/ DICTIONARY FILE pain pain GRAMMAR FILE pain TEXT OUTPUT
  16. 16. The Grammar File
  17. 17. HIDDEN MARKOV MODEL • Markov models are excellent ways of abstracting simple concepts into a relatively easily computable form. • Used in data compression to sound recognition. From this graph we can create sequences such as: N1 N2 N3 N1 N2 N2 N2 N3 N3 N3 N3 N3 N1 N1 N2 N2 N3
  18. 18. N1 N2 N3 = 0.4 X 0.8 X 0.5 = 0.16 N1 N2 N2 N2 N3 N3 N3 N3 N3 = 0.4 x 0.2 x 0.2 x 0.8 x 0.5 x 0.5 x 0.5 x 0.5 = 0.0008 N1 N1 N2 N2 N3 = 0.6 x 0.4 x 0.2 x 0.8 x 0.5 = 0.192
  19. 19. This accommodates for pronunciations such as: t ow m aa t ow - British English t ah m ey t ow - American English t ah mey t a - Possibly pronunciation when speaking quickly
  20. 20. With sentences such as: I like apple juice I like tomato juice I hate apple juice I hate tomato juice - Very probable - Very improbable! - Relatively improbable - Relatively probable
  21. 21. • The Markov Model makes the Speech Recognition systems more intelligent i.e. it can accurately differentiate between similar sounding words like in the case : James's school... James is cool • In simpler Markov models , the state is directly visible to the observer. • In a hidden Markov model, the state is not directly visible, but output, dependent on the state, is visible.
  22. 22. PERFORMANCE OF A SR SYSTEM • Accuracy is usually rated with word error rate (WER), whereas speed is measured with the real time factor. • Other measures of accuracy include Single Word Error Rate (SWER) and Command Success Rate (CSR).

  23. 23. Factors affecting the accuracy of a SR system :• • • • • • Vocabulary size and confusability Speaker dependence vs. independence Isolated, discontinuous, or continuous speech Task and language constraints Read vs. spontaneous speech Adverse conditions
  24. 24. APPLICATIONS • Health Care • Military - High Performance Aircrafts - Air Traffic Control Systems • Telephony – Smart-phones - Customer Helpline Services • Personal Computers
  25. 25. SIRI AND GOOGLE NOW Intelligent Personal Assistant developed by Apple. Google Now is an intelligent personal assistant developed by Google. Both use a combination of speaker- dependent and speaker-independent sr systems
  26. 26. CONCLUSION • Speech Recognition systems are an indispensable part of the ever-advancing field of humancomputer interaction. • Needs greater research to tackle various challenges. Thank You!

×