Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Google Voice-to-text

Download to read offline

API of google

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Google Voice-to-text

  1. 1. Google Voice-to-text November 13, 2017
  2. 2. Why this seminar? - Speech recognition technology is one from the fast growing engineering technologies. - Nearly 20% people of the world are suffering from various disabilities; many of them are blind or unable to use their hands effectively. they can share information with people by operating computer through voice input. - Our seminar is capable to recognize the speech and convert the input audio into text; it also enables a user to perform operations such as open calculator, wordpad, notepad, log off computer. - Powerful application in the field of entertainment
  3. 3. Applications In Car Systems ● Health care ● Military ● Training air traffic controller ● Telephony and other domains ● Usage in education and daily life ● Entertainment
  4. 4. Performance The performance of speech recognition systems is usually evaluated in terms of accuracy and speed. Accuracy is usually rated with word error rate (WER), whereas speed is measured with the real time factor. Other measures of accuracy include Single Word Error Rate (SWER) and Command Success Rate (CSR).
  5. 5. Accuracy Accuracy of speech recognition vary with the following: ● Vocabulary size and confusability ● Speaker dependence vs. independence ● Isolated, discontinuous, or continuous speech ● Task and language constraints ● Read vs. spontaneous speech
  6. 6. System block diagram
  7. 7. Acoustic Model An acoustic model is created by taking audio recordings of speech, and their text transcriptions, and using software to create statistical representations of the sounds that make up each word. It is used by a speech recognition engine to recognize speech.
  8. 8. Language Model A language model is a file containing the probabilities of sequences of words. Language models are used for dictation applications, whereas grammars are used in desktop command and control or telephony interactive voice response (IVR) type applications.
  9. 9. Speech Engine A speech engine is software that gives your computer the ability to play back text in a spoken voice (referred to as text-to-speech or TTS).
  10. 10. Powerful Speech Recognition of google cloud Google Cloud Speech API enables developers to convert audio to text by applying powerful neural network models in an easy to use API. The API recognizes over 110 languages and variants, to support your global user base. You can transcribe the text of users dictating to an application’s microphone, enable command- and-control through voice, or transcribe audio files, among many other use cases. Recognize audio uploaded in the request, and integrate with your audio storage on Google Cloud Storage, by using the same technology Google uses to power its own products. https://cloud.google.com/speech/
  11. 11. Apply api to create subtitle for video
  12. 12. Demo and Q&A Thank you <3 Refer application auto sub https://github.com/agermanidis/autosub
  • ssuser8112af

    Dec. 21, 2019

API of google

Views

Total views

417

On Slideshare

0

From embeds

0

Number of embeds

148

Actions

Downloads

4

Shares

0

Comments

0

Likes

1

×