2. A voice command program(VCP) is controlled by means of the
human voice. By removing the need to use buttons, user can
easily operate his/her system with their hands full or while doing
other tasks. It is also capable of responding to several
commands at once.
It can understand around 50 different commands. VCPs can be
found in computer operating systems, commercial software for
computers, mobile phones, cars, call centers, and internet
search engines such as Google
Voice command devices are becoming more widely
available, and innovative ways for using the human voice
are always being created
3. Speech recognition systems can be categorized by many parameters
1.SPEAKER INDEPENDENCE
Some SR systems use "speaker independent speech recognition" while others
use "training" where an individual speaker reads sections of text into the SR
system. These systems analyze the person's specific voice and use it to fine
tune the recognition of that person's speech, resulting in more accurate
transcription. Systems that do not use training are called "speaker independent"
systems. Systems that use training are called "speaker dependent" systems.
2 An isolated-word (Discrete) speech recognition system requires
that the speaker pauses briefly between words, whereas a
continuous speech recognition system does not.
3. Spontaneous, speech contains disfluencies, periods of pause
and restart, and is much more difficult to recognise than speech
read from script
4. There are various algorithms implemented for speech recognition
1.HMM-A hidden Markov model (HMM) is a statistical markov model in which the
system being modeled is assumed to be a markov process with unobserved
(hidden) states. It is closely related to an earlier work on optimal nonlinear filtering
problem (stochastic processes). Hidden Markov models are especially known for
their application in temporal pattern recognition such as speech, handwriting,
gesture recognition, part-of-speech tagging, musical score following, partial
discharges and bioinformatics.
2.ARTIFICIAL NEURAL NETWORKS-The inspiration for neural networks came
from examination of central nervous systems. In an artificial neural network,
simple artificial nodes, called "neurons", "neurodes", "processing elements" or
"units", are connected together to form a network which mimics a biological neural
network.There is no single formal definition of what an artificial neural network is.
Commonly, though, a class of statistical models will be called "neural" if
theyconsist of sets of adaptive weights, i.e. numerical parameters that are tuned
by a learning algorithm, andare capable of approximating non-linear functions of
their inputs.The adaptive weights are conceptually connection strengths between
neurons, which are activated during training and prediction.
5. Project Perspective
The purpose of voice command application is
manifold.
•It Increases productivity by facilitating multi tasking.
•It can help people who have trouble using their hand.
•It can help people who have cognitive disabilities.
•It is always more convenient to speak than write.
We have added to this application by enabling dynamic
voice commands and can feature total control on the
system.
6. MICROSOFT SAPI
The Speech Application Programming Interface or SAPI is an API developed by
Microsoft to allow the use of speech recognition and speech synthesis within
Windows applications. To date, a number of versions of the API have been
released, which have shipped either as part of a Speech SDK, or as part of the
Windows OS itself. Applications that use SAPI include Microsoft Office,
Microsoft Agent and Microsoft Speech Server
10. KEY ISSUES IN SPEECH RECOGNITION
1.Noise
Speech is uttered in an environment of sounds, a clock ticking, a computer humming,
a radio playing somewhere down the corridor, another human speaker in the
background etc. This is usually called noise, i.e., unwanted information in the speech
signal. In ASR we have to identify and filter out these noises from the speech signal.
Another kind of noise is the echo effect, which is the speech signal bounced
2. Continuous speech
Speech has no natural pauses between the word boundaries, the pauses mainly appear
on a syntactic level, such as after a phrase or a sentence. This introduces a difficult
problem for speech recognition — how should we translate a waveform into a
sequence of words? After a first stage of recognition into phones and phone
categories, we have to group them into words. Even if we disregard word boundary
ambiguity