Automatic Speech Recognion

Types of ASR?????
Approaches to ASR
ASR(Automatic Speech Recognition)?
What is Voice Recognition???
What Is Voice??
Process of Voice Recognition????
Why Voices are Different???
Component of Sound???
How Speech Recognition Works????

Application of Speech Processing??
Process of Speech Production???
Classification to Speech Sounds??
Approaches to Speech Recognition??

 The voice consists of sound made by a
human being using the vocal folds for
talking, singing, laughing, crying,
screaming, etc.
The voice consists of sound made by a
human being using the vocal folds for
talking, singing, laughing, crying,
screaming, etc.

It is the process of converting voice into
electric signals.
Signals transform into CODING
PATTERN.

The first ASR device was used in 1952
and recognized single digits spoken by a
user

TEMPLATE MATCHING
Template matching is
the simplest technique
and has the highest
accuracy when used
properly, but it also
suffers from the most
limitations.
ASR
Feature Analysis
A more general form
of voice recognition is
available through
feature analysis and
this technique usually
leads to "speaker-
independent" voice
recognition.

•It is SPEAKE DEPENDENT.
•It match voice with already saved
templates.
•Before it we’ve to trained the system.
• System must be trained.
•User speak same word which are avail
in template.
•Recognition accuracy can be about 98
percent.
Template Matching
•It is SPEAKER INDEPENDENT.
•First process the giving voice as inputut
•Using LPC(Linear Productive Coding)
•Attempt to find similarities b/w
expected
•Input and Digitized input.
•Recognition accuracy for
speaker-independent systems is
somewhat less than for
speaker-dependent systems, usually
between 90 and 95 percent.
Feature Analysis

TEXT Phonems
Articulary
Motions
Speak/
Say Someting
Acoustic Wave Form
Acoustic Wave Form
Spectrum
Analysis
Feature
Retractions
Coding
Phonems/
Word/Sentence
Semantics
Discrete Input Continuous Input

 Vocal Tract
Consist of laryngeal pharynx, oral
phyrnax, oral cavity, nassal cavity,
nassal phyrnx.
 Specturm Analysis
MFCC used to produce voice
feaature. DTW to select the pattern
that match the database(matLab).

 Acoustic Model
provide the acoustic sound of a language
and can be recognized the chore of a
particular user speech pattern and
acoustic environment.

 To make pattern recognition PCM
transfer into frequency domain

 Speaker Dependent
 Speaker Independent
 Discrete Speaker Recognition
 Continuous Speech Recognition
 Natural Languages

 Pitch
 Timber
 Harmonics
 Loudness
 Rhythm
 Attack
 Sustain
 Decay
 Speed

 COMPRESSION
in which particles are crowded
together, appear as upward curves in
the line.
 RAREFACTION
in which particles are spread apart,
appear as downward curves in the line.
 WAVELENGTH
this is the distance from the crest of one
wave to the crest of the next.

 FREQUENCY
this is the number of waves that
pass a point in each second.
 AMPLITUDE
this is the measure of the amount
of energy in a sound wave.

High Frequency Sound Wave Low Frequency Sound Wave
This is how high or low a sound seems.
A bird makes a high pitch.
A lion makes a low pitch.

Voices are different caused
by
INTENSITY(depend on amplitude) ,
PITCH(frequency) ,
TONE(pleasant or unpleasent).

 Divide the sound wave into evenly spaced
blocks
 Process each block for important
characteristics, such as strength across
various frequency ranges, number of zero
crossings, and total energy.
 Using this characteristic vector, attempt to
associate each block with a phone, which is
the most basic unit of speech, producing a
string of phones.
 Find the word whose model is the most likely
match to the string of phones which was
produced.

 Transfer the PCM into Accoustic
 Apply GRAMMER
 Figure out which PHONEMS are spoken
 Convert PHONEMS into WORDS

 Acoustic Phonetic Approach
 Pattern Recognition Approach(HMM)
 Artificial Intelligence Approach(Neural Networks)

Speech Processing
Analysis/Syntactic Coding
Recognition
Speaker Recognition Language Identification
Speech Recognition
Speech Mode Speaking StyleVocabulary SizeSpeaker Mode
•Isolated Speech
•Continuous Speech
•Speaker Dependent
•Speaker In-Dependent
•Speaker Adaptive
•Small
•Medium
•large
•Dictation
•Spontaneous

•Vocal Chord play active role in the
production of SOUND.
e.g. a/e/I
•It has high frequency
Voiced Sound
•When Vocal Chord is Inactive
Called UN VOICED SOUND
e.g. s/f
•It build up by pressure
Un Voiced Sound

 Speech Coding
 Speech Recognition
 Speech Verification/Identification
 Speech Enhancement(remove background noises)
 Speech Synthesis

 Grammar Design
 Signal Processing
 Phonemic Recognition
 Word Recognition
 Result Recognition

Automatic Speech Recognion

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Automatic Speech Recognion

Similar to Automatic Speech Recognion (20)

More from International Islamic University

More from International Islamic University (20)

Recently uploaded

Recently uploaded (20)

Automatic Speech Recognion