Automatic
Speech Recognition
Presented By
Akash Prajapati
1309113008
Contents
 Introduction
 Types of Speech Recognition
 Human Ears Working
 Process of speech recognition
 Feature Extraction
 Acoustic and Language model
 Applications
 Advantages
 Disadvantages
 Conclusion
 References
Introduction
Speech Recognition is also known as automatic speech recognition
or computer speech recognition which means understanding voice
by the computer and performing any required task. It is the
translation of spoken words into text.
Types of Speech Recognition
 Speaker dependent  Speaker-dependent software is commonly used
for dictation software.
Speaker-dependent software works by learning the unique characteristics
of a single person’s voice, in a way similar to voice recognition. New users
must first “train” the software by speaking to it, so the computer can analyze
how the person talks. This often means users have to read a few utter before
they can use the speech recognition software.
Types of Speech Recognition
 Speaker independent Speaker-independent software is designed to
recognize anyone’s voice, so no training is involved. This means it is the
only real option for applications such as interactive voice response
systems — where businesses can’t ask callers to read pages of text before
using the system.
These systems are the most difficult to develop, most expensive and
accuracy is lower than speaker dependent systems.
Listening Anatomy
Articulation produces sound
waves which the ear conveys to
the brain for processing
[2]
Process of Speech Recognition
Source-[1]
Feature Extraction
 It means capturing important qualities while discarding
unimportant and distracting features.
 Sources of Variability in Speech
(a) Speaker
(b) Microphone
(c) Pitch
(d) Environment
8
Acoustic Model
An acoustic model is created by taking audio recordings of
speech, and their text transcriptions, and using software
to create statistical representations of the sounds that make
up each word. It is used by a speech recognition engine to
recognize speech.
Language Model
 Language modeling is used in many natural language
processing applications such as speech recognition tries to
capture the properties of a language, and to predict the next word
in a speech sequence. In American English, the phrases
"recognize speech" and "wreck a nice beach" are pronounced
almost the same but mean very different things.
A statistical language model is a probability distribution over
sequences of words.
Language modeling is used in speech recognition, machine
translation, part-of-speech tagging, parsing, handwriting
recognition, information retrieval and other applications.
[4]
Accuracy
 Error rates increase as the vocabulary size grows.
 Vocabulary is hard to recognize if it contains confusable words.
 Isolated, Discontinuous or continuous speech.
Word error rate (WER)is a common metric of the performance of
a speech recognition .
Wiki
Tool Used For ASR
SPRAAK (Speech Processing, Recognition and Automatic Annotation Kit)
 open source speech recognition package.
 efficient decoder in a proven Hidden Markov model (HMM) architecture.
 SPRAAK uses 'scons' (a Python build-tool)
Advantages
 Documents can be generated up to three times as fast with speech
recognition as they can if they are typed.
 People with disabilities
 Reduce errors i.e. it eliminates spelling problems.
 Reduces businesses' labour costs in call centre by allowing them to
reduce the size of the staff on duty by replacing workers with
Speech Recognition Software.
 Speech recognition technology can also replace touch-tone dialing .
 Not necessary to sit at a keyboard or work with a remote control.
Disadvantages
 Difficult to build a perfect system.
 Filtering background noise is a task that can even be difficult for
humans to accomplish.
 Every human being has differences such as their voice, mouth, and
speaking style.
 Words are to be spoken clearly and loudly.
 If the microphone is used, then it should be close to the user.
Applications
1. Dictation
2. In-car systems
3. Robotics
4. High-performance fighter aircraft
5. Voice dialing
6. Ok Google
7. Voice Security System
16
Conclusion
The world of speech recognition is rapidly changing and evolving.
Early applications of the technology have achieved varying degrees of
success. The promise for the future is significantly higher performance
for almost every speech recognition technology area, with more
robustness to speakers, background noises etc. This will ultimately lead
to reliable, robust voice interfaces to every telecommunications service
that is offered, thereby making them universally available.
Future Scope
 We can design and implement speech recognition and rectification
system for articulatory handicapped people which will be a great work
for society. And hence we can reduce the speech communication
problems faced by articulatory handicapped people in their day to day
life.
 ASR tools available today are much affected by noise present in the
surrounding and thus produces less efficient result.
18
References
 [1] Parwinder pal Singh And Er. Bhupinder singh” Speech Recognition as Emerging
Revolutionary Technology”, Volume 2, Issue 10, October 2012.
 [2] Sanjivani S. Bhabad And Gajanan K. Kharate “An Overview of Technical Progress in Speech
Recognition”,Proc. IEEE,Vol. 3,Issue 3 ,Mar. 2013.
 [3] L. R. Rabiner, “APPLICATIONS OF SPEECH RECOGNITION IN THE AREA OF
TELECOMMUNICATION”,Proc. IEEE, Vol. 82, No. 4, pp. 199-228, Feb. 1994.
 [4]http://www.hyoka.koho.titech.ac.jp/eprd/recently/research/research.php?id%3D386%26p
age_lang%3Den&h=426&w=650&tbnid=fI32gnA61A4vOM:&docid=v2dgmilrU_mkbM&e
i=Y3BVtjnCsvK0ASzsrTwAw&tbm=isch&ved=0ahUKEwiYpfrDifrKAhVLJZQKHTMZD
T4QMwg-KBcwFw
Thank You

Seminar

  • 1.
  • 2.
    Contents  Introduction  Typesof Speech Recognition  Human Ears Working  Process of speech recognition  Feature Extraction  Acoustic and Language model  Applications  Advantages  Disadvantages  Conclusion  References
  • 3.
    Introduction Speech Recognition isalso known as automatic speech recognition or computer speech recognition which means understanding voice by the computer and performing any required task. It is the translation of spoken words into text.
  • 4.
    Types of SpeechRecognition  Speaker dependent  Speaker-dependent software is commonly used for dictation software. Speaker-dependent software works by learning the unique characteristics of a single person’s voice, in a way similar to voice recognition. New users must first “train” the software by speaking to it, so the computer can analyze how the person talks. This often means users have to read a few utter before they can use the speech recognition software.
  • 5.
    Types of SpeechRecognition  Speaker independent Speaker-independent software is designed to recognize anyone’s voice, so no training is involved. This means it is the only real option for applications such as interactive voice response systems — where businesses can’t ask callers to read pages of text before using the system. These systems are the most difficult to develop, most expensive and accuracy is lower than speaker dependent systems.
  • 6.
    Listening Anatomy Articulation producessound waves which the ear conveys to the brain for processing [2]
  • 7.
    Process of SpeechRecognition Source-[1]
  • 8.
    Feature Extraction  Itmeans capturing important qualities while discarding unimportant and distracting features.  Sources of Variability in Speech (a) Speaker (b) Microphone (c) Pitch (d) Environment 8
  • 9.
    Acoustic Model An acousticmodel is created by taking audio recordings of speech, and their text transcriptions, and using software to create statistical representations of the sounds that make up each word. It is used by a speech recognition engine to recognize speech.
  • 10.
    Language Model  Languagemodeling is used in many natural language processing applications such as speech recognition tries to capture the properties of a language, and to predict the next word in a speech sequence. In American English, the phrases "recognize speech" and "wreck a nice beach" are pronounced almost the same but mean very different things. A statistical language model is a probability distribution over sequences of words. Language modeling is used in speech recognition, machine translation, part-of-speech tagging, parsing, handwriting recognition, information retrieval and other applications.
  • 11.
  • 12.
    Accuracy  Error ratesincrease as the vocabulary size grows.  Vocabulary is hard to recognize if it contains confusable words.  Isolated, Discontinuous or continuous speech. Word error rate (WER)is a common metric of the performance of a speech recognition . Wiki
  • 13.
    Tool Used ForASR SPRAAK (Speech Processing, Recognition and Automatic Annotation Kit)  open source speech recognition package.  efficient decoder in a proven Hidden Markov model (HMM) architecture.  SPRAAK uses 'scons' (a Python build-tool)
  • 14.
    Advantages  Documents canbe generated up to three times as fast with speech recognition as they can if they are typed.  People with disabilities  Reduce errors i.e. it eliminates spelling problems.  Reduces businesses' labour costs in call centre by allowing them to reduce the size of the staff on duty by replacing workers with Speech Recognition Software.  Speech recognition technology can also replace touch-tone dialing .  Not necessary to sit at a keyboard or work with a remote control.
  • 15.
    Disadvantages  Difficult tobuild a perfect system.  Filtering background noise is a task that can even be difficult for humans to accomplish.  Every human being has differences such as their voice, mouth, and speaking style.  Words are to be spoken clearly and loudly.  If the microphone is used, then it should be close to the user.
  • 16.
    Applications 1. Dictation 2. In-carsystems 3. Robotics 4. High-performance fighter aircraft 5. Voice dialing 6. Ok Google 7. Voice Security System 16
  • 17.
    Conclusion The world ofspeech recognition is rapidly changing and evolving. Early applications of the technology have achieved varying degrees of success. The promise for the future is significantly higher performance for almost every speech recognition technology area, with more robustness to speakers, background noises etc. This will ultimately lead to reliable, robust voice interfaces to every telecommunications service that is offered, thereby making them universally available.
  • 18.
    Future Scope  Wecan design and implement speech recognition and rectification system for articulatory handicapped people which will be a great work for society. And hence we can reduce the speech communication problems faced by articulatory handicapped people in their day to day life.  ASR tools available today are much affected by noise present in the surrounding and thus produces less efficient result. 18
  • 19.
    References  [1] Parwinderpal Singh And Er. Bhupinder singh” Speech Recognition as Emerging Revolutionary Technology”, Volume 2, Issue 10, October 2012.  [2] Sanjivani S. Bhabad And Gajanan K. Kharate “An Overview of Technical Progress in Speech Recognition”,Proc. IEEE,Vol. 3,Issue 3 ,Mar. 2013.  [3] L. R. Rabiner, “APPLICATIONS OF SPEECH RECOGNITION IN THE AREA OF TELECOMMUNICATION”,Proc. IEEE, Vol. 82, No. 4, pp. 199-228, Feb. 1994.  [4]http://www.hyoka.koho.titech.ac.jp/eprd/recently/research/research.php?id%3D386%26p age_lang%3Den&h=426&w=650&tbnid=fI32gnA61A4vOM:&docid=v2dgmilrU_mkbM&e i=Y3BVtjnCsvK0ASzsrTwAw&tbm=isch&ved=0ahUKEwiYpfrDifrKAhVLJZQKHTMZD T4QMwg-KBcwFw
  • 20.