Google Voice-to-text
November 13, 2017
Why this seminar?
- Speech recognition technology is one from the fast growing
engineering technologies.
- Nearly 20% people of the world are suffering from various
disabilities; many of them are blind or unable to use their
hands effectively. they can share information with people by
operating computer through voice input.
- Our seminar is capable to recognize the speech and convert
the input audio into text; it also enables a user to perform
operations such as open calculator, wordpad, notepad, log off
computer.
- Powerful application in the field of entertainment
Applications
In Car Systems
● Health care
● Military
● Training air traffic controller
● Telephony and other domains
● Usage in education and daily life
● Entertainment
Performance
The performance of speech recognition systems is usually evaluated in terms of
accuracy and speed. Accuracy is usually rated with word error rate (WER), whereas
speed is measured with the real time factor. Other measures of accuracy include
Single Word Error Rate (SWER) and Command Success Rate (CSR).
Accuracy
Accuracy of speech recognition vary with the following:
● Vocabulary size and confusability
● Speaker dependence vs. independence
● Isolated, discontinuous, or continuous speech
● Task and language constraints
● Read vs. spontaneous speech
System block diagram
Acoustic Model
An acoustic model is created by taking audio recordings of speech, and their text transcriptions, and using
software to create statistical representations of the sounds that make up each word. It is used by a speech
recognition engine to recognize speech.
Language Model
A language model is a file containing the probabilities of sequences of words. Language models are used
for dictation applications, whereas grammars are used in desktop command and control or telephony
interactive voice response (IVR) type applications.
Speech Engine
A speech engine is software that gives your computer the ability to play back text in a spoken voice
(referred to as text-to-speech or TTS).
Powerful Speech Recognition of google cloud
Google Cloud Speech API enables developers to convert audio to
text by applying powerful neural network models in an easy to use
API. The API recognizes over 110 languages and variants, to
support your global user base. You can transcribe the text of
users dictating to an application’s microphone, enable command-
and-control through voice, or transcribe audio files, among many
other use cases. Recognize audio uploaded in the request, and
integrate with your audio storage on Google Cloud Storage, by
using the same technology Google uses to power its own products.
https://cloud.google.com/speech/
Apply api to create subtitle for video
Demo and Q&A
Thank you <3
Refer application auto sub https://github.com/agermanidis/autosub

Google Voice-to-text

  • 1.
  • 2.
    Why this seminar? -Speech recognition technology is one from the fast growing engineering technologies. - Nearly 20% people of the world are suffering from various disabilities; many of them are blind or unable to use their hands effectively. they can share information with people by operating computer through voice input. - Our seminar is capable to recognize the speech and convert the input audio into text; it also enables a user to perform operations such as open calculator, wordpad, notepad, log off computer. - Powerful application in the field of entertainment
  • 3.
    Applications In Car Systems ●Health care ● Military ● Training air traffic controller ● Telephony and other domains ● Usage in education and daily life ● Entertainment
  • 4.
    Performance The performance ofspeech recognition systems is usually evaluated in terms of accuracy and speed. Accuracy is usually rated with word error rate (WER), whereas speed is measured with the real time factor. Other measures of accuracy include Single Word Error Rate (SWER) and Command Success Rate (CSR).
  • 5.
    Accuracy Accuracy of speechrecognition vary with the following: ● Vocabulary size and confusability ● Speaker dependence vs. independence ● Isolated, discontinuous, or continuous speech ● Task and language constraints ● Read vs. spontaneous speech
  • 6.
  • 7.
    Acoustic Model An acousticmodel is created by taking audio recordings of speech, and their text transcriptions, and using software to create statistical representations of the sounds that make up each word. It is used by a speech recognition engine to recognize speech.
  • 8.
    Language Model A languagemodel is a file containing the probabilities of sequences of words. Language models are used for dictation applications, whereas grammars are used in desktop command and control or telephony interactive voice response (IVR) type applications.
  • 9.
    Speech Engine A speechengine is software that gives your computer the ability to play back text in a spoken voice (referred to as text-to-speech or TTS).
  • 10.
    Powerful Speech Recognitionof google cloud Google Cloud Speech API enables developers to convert audio to text by applying powerful neural network models in an easy to use API. The API recognizes over 110 languages and variants, to support your global user base. You can transcribe the text of users dictating to an application’s microphone, enable command- and-control through voice, or transcribe audio files, among many other use cases. Recognize audio uploaded in the request, and integrate with your audio storage on Google Cloud Storage, by using the same technology Google uses to power its own products. https://cloud.google.com/speech/
  • 11.
    Apply api tocreate subtitle for video
  • 12.
    Demo and Q&A Thankyou <3 Refer application auto sub https://github.com/agermanidis/autosub