Speech Recognition

Speech Recognition
Created By :
Kanjariya Hardik G.
Roll No : 17

Introduction
 Speech recognition technology has recently reached a
higher level of performance and robustness, allowing it
to communicate to another user by talking .
 Speech Recognization is process of decoding acoustic
speech signal captured by microphone or telephone ,to a
set of words.
 And with the help of these it will recognize whole
speech is recognized word by word .

Types of SR
 There are two main types of speaker models: speaker independent
and speaker dependent.
 Speaker independent models recognize the speech patterns of a large
group of people.
 Speaker dependent models recognize speech patterns from only one
person. Both models use mathematical and statistical formulas to yield
the best work match for speech. A third variation of speaker models is
now emerging, called speaker adaptive.
 Speaker adaptive systems usually begin with a speaker independent
model and adjust these models more closely to each individual during a
brief training period.

How does it works?..
 Speech produces a sound pressure wave which forms an
acoustic signal.
The microphone
– receives the acoustic signal and converts it to an
analogue signal.
 To store the analogue signal, it must be converted to a
digital signal.
 A speech recognizer tries to transform a digitally
encoded acoustic signal in a natural language
into text in that language.

Speech Waveform/Spectrogram
s p ee ch l a b
Hz
 The spectrogram is an alternative way to characterize speech.
 The louder the sound the greater the amplitude on the y-axis.
s

Speech Recognition Process
Flow

The major components
 Audio input
 Grammar
 Acoustic Model
 Recognized text

Audio I/O
 It is important to understand that this audio
stream is rarely pristine
 It contains not only the speech data (what was
said) but also background noise.
 This noise can interfere with the recognition
process, and the speech engine must handle (and
possibly even adapt to) the environment within
which the audio is spoken.

Acoustic+Grammer
 Once the speech data is in the proper format, the engine
searches for the best match.
 It does this by taking into consideration the words and phrases
it knows about (the active grammars), along with its
knowledge of the environment in which it is operating.
 The knowledge of the environment is provided in the form of
an acoustic model.
 Once it identifies the most likely match for what was said, it
returns what it recognized as a text string.

About SR Engine
 SR requires a software application "engine" with logic
built in to decipher and act on the spoken word.
 Sound Card
– Converts acoustic signal to digital signal.
 Function of SR Engine-
– SR Engine converts these digital signal to
phonemes to word.

 Different SR engine
 CMU Sphinx
 Microsoft SAPI
 IBM ViaVoice

Recognition Process Flow
Summary
Step 1:User Input
The system catches user’s voice in the form of analog
acoustic signal.
Step 2:Digitization
Digitize the analog acoustic signal.
Step 3:Phonetic Breakdown
Breaking signals into phonemes.

Recognition Process Flow
Summary
 Step 4:Statistical Modeling
 Mapping phonemes to their phonetic representation
using statistics model.
 Step 5:Matching
 According to grammar , phonetic representation and
Dictionary , the system returns an n-best list (I.e.:a
word plus a confidence score)
 Grammar-the union words or phrases to constraint the
range of input or output in the voice application.
 Dictionary-the mapping table of phonetic
representation and word(EX:thu,theethe)

REPRESENTATION OF SOFTWARE
15

Challenges and Difficulties
of SR
Speech Recognition is still a very cumbersome problem. Following
are the problem….
 Speaker Variability
Two speakers or even the same speaker will pronounce the
same word differently
 Channel Variability
The quality and position of microphone and background
environment will affect the output

Current Software Options for PC
 Dragon Systems – Naturally Speaking
 Philips – FreeSpeech
 IBM – ViaVoice
 Lernout & Hauspie – Voice Xpress

Speech Recognition

More Related Content

What's hot

Viewers also liked

Similar to Speech Recognition

More from Hardik Kanjariya

Recently uploaded

Speech Recognition

Editor's Notes