3. INTRODUCTION
• It is also know as automatic speech recognition or computer
speech recognition or voice recognition .
• Which means understanding voice by the computer and
preforming any required task.
• A user gives a predefined voice instruction to the system through
the microphone , the system understand the command and
execute the require function .
• It facilities the user to run window through your voice without use
of keyboard or mouse.
4. PRINCIPLE OF SR
The smallest unit of spoken language is know as a phoneme.
The English language contains 44 phonemes representing all the
vowels and consonants that we use for speech.
We can take the example of a typical word such as moon which can
be broken down into three phonemes: m, ue, n.
To create a speech recognition engine, a large database of models is
created to match each phoneme.
When a comparison is performed, the most likely match is
determined b/w the spoken phoneme & the stored one, further
computations are performed.
5. TYPES OF SR SYSTEM
• Speaker dependent SR system :- work by learning the unique
characteristics of a single person’s voice and depend on the speaker
for training. It means that user have to read a few pages of text to the
computer before they can use the speech recognition software.it is
dictation s/w.
• Speaker independent SR system:-speaker independent s/w is
designed to recognize anyone’s voice, so no training is involved. It
means the only real option for applications such as interactive voice
response systems.
6. KEY TERMS
Speaking modes
Signal analyzer
Acoustic model
Language model
Digitization
Phonetics
Phonology
Semantics & pragmatics
Lexicology & syntax
Isolated words
Continuous speech
7. KEY TERMS
SIGNAL ANALYZER:
Analyses the speech signal and
removes the background noise
thus focusing only on the
speaker’s speech.
ACOUSTIC MODEL:
identifies phonemes from the
speech sample using a
probability based mathematical
model
8. KEY TERMS
LANGUAGE MODEL :
Identifies words and thus sentences
uttered by the speaker from the
phonemes by making use of a
dictionary file and grammar file.
DIGITIZATION :
Analogue to digital conversion.
• Sampling is converting a
continuous signal into a discrete
signal.
• Quantizing is the process of
approximating a continuous
range of values.
9. KEY TERMS
PHONETICS:
It is variability in human speech.
PHONOLOGY:
It is recognizing individual sound distinctions. Its the systematic
use of sound to encode meaning in any spoken human language.
SEMANTICS & PRAGMATICS:
• Semantics tell the meaning.
• Pragmatics is concerned with bridging the explanatory gap
between sentence meaning and speaker’s meaning
10. KEY TERMS
LEXICOLOGY & SYNTAX:
• Lexicology is that part of linguistics which studies words,
their nature & meaning.
• Syntax tell about the arrangement of words and phrases to
create well formed sentences.
12. HOW DO HUMANS DO IT?
First articulation produce
sound waves , which the
ear conveys to the brain
for processing.
13. APPLICATIONS
MILITARY (High performance aircraft, Helicopters)
People with disabilities
Dyslexic people
Computer & video games( Microsoft Xbox, Sony ps2
consoles all offer games with speech i/p & o/p.
Medical transcription
Mobile phone devices
Voice security system
14. FUTURE SCOPE
Accuracy will become more and more.
Small hand-held writing tablets for computer speech recognition
dictation and data entry will be developed, as faster processor and
more memory become available.
Greater use will be made of “intelligent systems” which will
attempt to guess what the speaker intend to say, rather than what
was actually said , as people often misspeak and make
unintentional mistakes.
Microphone and sound systems will be designed to adapt more
quickly to changing background noise levels, different
environments, with better recognition of extraneous material to
be discarded.
17. KEY CHALLENGES
SR system have to deal with a large number of challenges
like:-
The speaker’s voice is often accompanied by surrounding
noise. Which makes their accurate recognition difficult.
A speaker may speak a number of different words and all
of these words have to be accurately recognized.
Accent of speaking varies from person to person and this
is very big challenge.
A speaker may speak something very quickly and all of the
words spoken have to be individually recognized accurately