SlideShare a Scribd company logo
1 of 14
SPEECH TO TEXT
CONVERSION
Why this project?
 Speech recognition technology is one from the fast growing
engineering technologies.
 Nearly 20% people of the world are suffering from various
disabilities; many of them are blind or unable to use their
hands effectively. they can share information with people by
operating computer through voice input.
 Our project is capable to recognize the speech and convert
the input audio into text; it also enables a user to perform
operations such as open calculator, wordpad, notepad, log off
computer.
APPLICATIONS
 In Car Systems
 Health Care
 Military
 Training air traffic controllers
 Telephony and other domains
 Usage in education and daily life
PERFORMANCE
The performance of speech recognition systems is usually
evaluated in terms of accuracy and speed. Accuracy is
usually rated with word error rate (WER), whereas speed is
measured with the real time factor. Other measures of
accuracy include Single Word Error Rate (SWER)
and Command Success Rate (CSR).
Accuracy
Accuracy of speech recognition vary
with the following:
 Vocabulary size and confusability
 Speaker dependence vs. independence
 Isolated, discontinuous, or continuous speech
 Task and language constraints
 Read vs. spontaneous speech
SYSTEM BLOCK DIAGRAM
Acoustic Model
An acoustic model is created by taking audio recordings of speech, and their text
transcriptions, and using software to create statistical representations of the sounds that
make up each word. It is used by a speech recognition engine to recognize speech.
Language Model
A language model is a file containing the probabilities of sequences of words. Language
models are used for dictation applications, whereas grammars are used in desktop
command and control or telephony interactive voice response (IVR) type applications.
Speech Engine
A speech engine is software that gives your computer the ability to play back text in a
spoken voice (referred to as text-to-speech or TTS).
HMM (HIDDEN MARKOV
MODEL)
These are statistical models that output a sequence of symbols or
quantities. HMMs are used in speech recognition because a speech
signal can be viewed as a piecewise stationary signal or a short-time
stationary signal. In a short time-scale (e.g., 10 milliseconds),
speech can be approximated as a stationary process. Speech can be
thought of as a Markov model for many stochastic purposes.
HMM Codebook
HMM Speech Process
Advantages
 Able to write the text both through keyboard and voice input .
 Voice recognition of different notepad commands such as
open save and clear.
 Open different windows soft wares, based on voice input.
 Lower operational costs.
 Provide significant help for the people with disabilities.
 Requires less consumption of time in writing text.
WEAKNESS
Homonyms:
Are the words that are differently spelled and have the different meaning but
acquires the same meaning, for example “there” “their”, “be” and “bee”. This is a
challenge for computer machine to distinguish between such types of phrases that
sound alike.
Speeches:
A second challenge in the process, is to understand the speech uttered by different
users, current systems have a difficulty to separate simultaneous speeches form
multiple users.
Noise factor:
the program requires hearing the words uttered by a human distinctly and clearly.
Any extra sound can create interference, first you need to place system away from
noisy environments and then speak clearly else the machine will confuse and will
mix up the words.
FUTURE SCOPE
 Accuracy will become better and better
 Dictation speech recognition will gradually become accepted
 Greater use will be made of “intelligent systems” which will
attempt to guess what the speaker intended to say, rather than what
was actually said, as people often misspeak and make unintentional
mistakes.
 Microphone and sound systems will be designed to adapt more
quickly to changing background noise levels, different
environments, with better recognition of extraneous material to be
discarded.

More Related Content

What's hot

ppt of gesture recognition
ppt of gesture recognitionppt of gesture recognition
ppt of gesture recognition
Aayush Agrawal
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
Hugo Moreno
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
Amrita More
 
TEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptxTEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptx
Nsaroj kumar
 
Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminar
Diptimaya Sarangi
 
Sign language translator ieee power point
Sign language translator ieee power pointSign language translator ieee power point
Sign language translator ieee power point
Madhuri Yellapu
 

What's hot (20)

Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
ppt of gesture recognition
ppt of gesture recognitionppt of gesture recognition
ppt of gesture recognition
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 
SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK
 
TEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptxTEXT-SPEECH PPT.pptx
TEXT-SPEECH PPT.pptx
 
Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminar
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentation
 
Artificial Intelligence for Speech Recognition
Artificial Intelligence for Speech RecognitionArtificial Intelligence for Speech Recognition
Artificial Intelligence for Speech Recognition
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Hand Gesture Recognition
Hand Gesture RecognitionHand Gesture Recognition
Hand Gesture Recognition
 
Text to-speech & voice recognition
Text to-speech & voice recognitionText to-speech & voice recognition
Text to-speech & voice recognition
 
Sign language translator ieee power point
Sign language translator ieee power pointSign language translator ieee power point
Sign language translator ieee power point
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by Iqbal
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by Iqbal
 
Voice morphing-
Voice morphing-Voice morphing-
Voice morphing-
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
 
Visual speech to text conversion applicable to telephone communication
Visual speech to text conversion  applicable  to telephone communicationVisual speech to text conversion  applicable  to telephone communication
Visual speech to text conversion applicable to telephone communication
 

Similar to Speech to text conversion

Paper on Speech Recognition
Paper on Speech RecognitionPaper on Speech Recognition
Paper on Speech Recognition
Thejus Joby
 
Abstract of speech recognition
Abstract of speech recognitionAbstract of speech recognition
Abstract of speech recognition
Vinay Jaisriram
 

Similar to Speech to text conversion (20)

Assign
AssignAssign
Assign
 
Seminar
SeminarSeminar
Seminar
 
Paper on Speech Recognition
Paper on Speech RecognitionPaper on Speech Recognition
Paper on Speech Recognition
 
AI for voice recognition.pptx
AI for voice recognition.pptxAI for voice recognition.pptx
AI for voice recognition.pptx
 
voice browser
voice browservoice browser
voice browser
 
Artificial Intelligence- An Introduction
Artificial Intelligence- An IntroductionArtificial Intelligence- An Introduction
Artificial Intelligence- An Introduction
 
Artificial Intelligence - An Introduction
Artificial Intelligence - An Introduction Artificial Intelligence - An Introduction
Artificial Intelligence - An Introduction
 
Google Voice-to-text
Google Voice-to-textGoogle Voice-to-text
Google Voice-to-text
 
Speech recognizers & generators
Speech recognizers & generatorsSpeech recognizers & generators
Speech recognizers & generators
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Abstract of speech recognition
Abstract of speech recognitionAbstract of speech recognition
Abstract of speech recognition
 
Bt35408413
Bt35408413Bt35408413
Bt35408413
 
VOICE RECOGNITION SYSTEM
VOICE RECOGNITION SYSTEMVOICE RECOGNITION SYSTEM
VOICE RECOGNITION SYSTEM
 
BTP paper
BTP paperBTP paper
BTP paper
 
ACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITIONACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITION
 
FINAL report
FINAL reportFINAL report
FINAL report
 
Ijetcas14 390
Ijetcas14 390Ijetcas14 390
Ijetcas14 390
 
visH (fin).pptx
visH (fin).pptxvisH (fin).pptx
visH (fin).pptx
 
Presentation 204 lisa bruening aac in times of change
Presentation 204  lisa bruening aac in times of changePresentation 204  lisa bruening aac in times of change
Presentation 204 lisa bruening aac in times of change
 
30
3030
30
 

Recently uploaded

Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Recently uploaded (20)

BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 

Speech to text conversion

  • 2. Why this project?  Speech recognition technology is one from the fast growing engineering technologies.  Nearly 20% people of the world are suffering from various disabilities; many of them are blind or unable to use their hands effectively. they can share information with people by operating computer through voice input.  Our project is capable to recognize the speech and convert the input audio into text; it also enables a user to perform operations such as open calculator, wordpad, notepad, log off computer.
  • 3. APPLICATIONS  In Car Systems  Health Care  Military  Training air traffic controllers  Telephony and other domains  Usage in education and daily life
  • 4. PERFORMANCE The performance of speech recognition systems is usually evaluated in terms of accuracy and speed. Accuracy is usually rated with word error rate (WER), whereas speed is measured with the real time factor. Other measures of accuracy include Single Word Error Rate (SWER) and Command Success Rate (CSR).
  • 5. Accuracy Accuracy of speech recognition vary with the following:  Vocabulary size and confusability  Speaker dependence vs. independence  Isolated, discontinuous, or continuous speech  Task and language constraints  Read vs. spontaneous speech
  • 7. Acoustic Model An acoustic model is created by taking audio recordings of speech, and their text transcriptions, and using software to create statistical representations of the sounds that make up each word. It is used by a speech recognition engine to recognize speech. Language Model A language model is a file containing the probabilities of sequences of words. Language models are used for dictation applications, whereas grammars are used in desktop command and control or telephony interactive voice response (IVR) type applications. Speech Engine A speech engine is software that gives your computer the ability to play back text in a spoken voice (referred to as text-to-speech or TTS).
  • 8. HMM (HIDDEN MARKOV MODEL) These are statistical models that output a sequence of symbols or quantities. HMMs are used in speech recognition because a speech signal can be viewed as a piecewise stationary signal or a short-time stationary signal. In a short time-scale (e.g., 10 milliseconds), speech can be approximated as a stationary process. Speech can be thought of as a Markov model for many stochastic purposes.
  • 11.
  • 12. Advantages  Able to write the text both through keyboard and voice input .  Voice recognition of different notepad commands such as open save and clear.  Open different windows soft wares, based on voice input.  Lower operational costs.  Provide significant help for the people with disabilities.  Requires less consumption of time in writing text.
  • 13. WEAKNESS Homonyms: Are the words that are differently spelled and have the different meaning but acquires the same meaning, for example “there” “their”, “be” and “bee”. This is a challenge for computer machine to distinguish between such types of phrases that sound alike. Speeches: A second challenge in the process, is to understand the speech uttered by different users, current systems have a difficulty to separate simultaneous speeches form multiple users. Noise factor: the program requires hearing the words uttered by a human distinctly and clearly. Any extra sound can create interference, first you need to place system away from noisy environments and then speak clearly else the machine will confuse and will mix up the words.
  • 14. FUTURE SCOPE  Accuracy will become better and better  Dictation speech recognition will gradually become accepted  Greater use will be made of “intelligent systems” which will attempt to guess what the speaker intended to say, rather than what was actually said, as people often misspeak and make unintentional mistakes.  Microphone and sound systems will be designed to adapt more quickly to changing background noise levels, different environments, with better recognition of extraneous material to be discarded.