2. Why this seminar?
- Speech recognition technology is one from the fast growing
engineering technologies.
- Nearly 20% people of the world are suffering from various
disabilities; many of them are blind or unable to use their
hands effectively. they can share information with people by
operating computer through voice input.
- Our seminar is capable to recognize the speech and convert
the input audio into text; it also enables a user to perform
operations such as open calculator, wordpad, notepad, log off
computer.
- Powerful application in the field of entertainment
3. Applications
In Car Systems
● Health care
● Military
● Training air traffic controller
● Telephony and other domains
● Usage in education and daily life
● Entertainment
4. Performance
The performance of speech recognition systems is usually evaluated in terms of
accuracy and speed. Accuracy is usually rated with word error rate (WER), whereas
speed is measured with the real time factor. Other measures of accuracy include
Single Word Error Rate (SWER) and Command Success Rate (CSR).
5. Accuracy
Accuracy of speech recognition vary with the following:
● Vocabulary size and confusability
● Speaker dependence vs. independence
● Isolated, discontinuous, or continuous speech
● Task and language constraints
● Read vs. spontaneous speech
7. Acoustic Model
An acoustic model is created by taking audio recordings of speech, and their text transcriptions, and using
software to create statistical representations of the sounds that make up each word. It is used by a speech
recognition engine to recognize speech.
8. Language Model
A language model is a file containing the probabilities of sequences of words. Language models are used
for dictation applications, whereas grammars are used in desktop command and control or telephony
interactive voice response (IVR) type applications.
9. Speech Engine
A speech engine is software that gives your computer the ability to play back text in a spoken voice
(referred to as text-to-speech or TTS).
10. Powerful Speech Recognition of google cloud
Google Cloud Speech API enables developers to convert audio to
text by applying powerful neural network models in an easy to use
API. The API recognizes over 110 languages and variants, to
support your global user base. You can transcribe the text of
users dictating to an application’s microphone, enable command-
and-control through voice, or transcribe audio files, among many
other use cases. Recognize audio uploaded in the request, and
integrate with your audio storage on Google Cloud Storage, by
using the same technology Google uses to power its own products.
https://cloud.google.com/speech/