SlideShare a Scribd company logo
Speech Recognition
An Overview

Submitted To:

Submitted By:

Name: Dr. M Syamala Devi

Name: Varun Jain

Designation: Professor

Roll No.: 34

Department: DCSA, Panjab
University, Chandigarh

Class: MCA 5th Semester (M)

Panjab University, Chandigarh
Table of Contents


Introduction



Introduction – A closer look



Terms and Concepts in Speech Recognition



Pronunciation





Utterances
Grammar

How it works


Acoustic Model



Grammar



Recognized Text



Performance Measurement



Standards



Google Search by Voice (GSbV) – A case study


GSbV – A screenshot



GSbV – An example



Conclusions



References
Introduction


Have you ever talked to your computer?



I mean, have you really, really talked to your computer?



Where it actually recognized what you said and then did something as a result?



If you have, then you've used a technology known as speech recognition.



It is the process of converting spoken input to text.



Speech recognition allows you to provide input to an application with your voice.



Speech recognition is thus sometimes referred to as speech-to-text.



In the desktop world, you need a microphone to be able to do this.



The speech recognition process is performed by a software component known as
the speech recognition engine.



The primary function of the speech recognition engine is to process spoken input
and translate it into text that an application understands.
Introduction – A closer look


Upon hearing a voice as input, a speech recognition application can do one of
the following thing:






The application can interpret the result of the recognition as a command. In this
case, the application is a command and control application. An example of a
command and control application is one in which the caller says “check balance”,
and the application returns the current balance of the caller’s account.
If an application handles the recognized text simply as text, then it is considered a
dictation application. In a dictation application, if you said “check balance,” the
application would not interpret the result, but simply return the text “check
balance”.

For example, in response to hearing "Please say coffee, tea, or milk," you say
"coffee" and the speech recognition application you're calling tells you what
the flavor of the day is and then asks if you'd like to place an order.
Terms and Concepts in Speech
Recognition


There are basically three terms that are majorly used in context of speech
recognition concept.



These three are:


Utterance



Pronunciation



Grammar
Utterance


When the user says something, this is known as an utterance.



An utterance is any stream of speech between two periods of silence.



Utterances are sent to the speech engine to be processed.



Silence, in speech recognition, is almost as important as what is spoken, because
silence delineates the start and end of an utterance.



The speech recognition engine is "listening" for speech input.



When the engine detects audio input - in other words, a lack of silence -- the
beginning of an utterance is signaled.



Similarly, when the engine detects a certain amount of silence following the
audio, the end of the utterance occurs.



If the user doesn’t say anything, the engine returns what is known as a silence
timeout - an indication that there was no speech detected within the expected
timeframe, and the application takes an appropriate action, such as re-prompting
the user for input.
Pronunciation


The speech recognition engine uses all sorts of data, statistical models, and
algorithms to convert spoken input into text.



One piece of information that the speech recognition engine uses to process a
word is its pronunciation, which represents what the speech engine thinks a
word should sound like.



Words can have multiple pronunciations associated with them.



For example, the word “the” has at least two pronunciations in the U.S.
English language: “thee” and “thuh.”
Grammar


One must specify the words and phrases that users can say to your
application.



These words and phrases are defined to the speech recognition engine and
are used in the recognition process.



You can specify the valid words and phrases in a number of different ways,
but in application, you do this by specifying a grammar.



A grammar uses a particular syntax, or set of rules, to define the words and
phrases that can be recognized by the engine.



A grammar can be as simple as a list of words, or it can be flexible enough to
allow such variability in what can be said that it approaches natural language
capability.
How it works

Fig. 1 A basic working model of speech recognition
depicting how a general speech recognition works.
Acoustic Model


Acoustic models provide an estimate for the likelihood of the observed
features in a frame of speech given a particular phonetic context.



The features are typically related to measurements of the spectral
characteristics of a time-slice of speech.



While individual recipes for training acoustic models vary in their structure
and sequencing, the basic process involves aligning transcribed speech to
states within an existing acoustic model, accumulating frames associated with
each state, and re-estimating the probability distributions associated with the
state, given the features observed in those frames.
Grammar


A grammar can be as simple as a list of words, or it can be flexible enough to
allow such variability in what can be said that it approaches natural language
capability.



Grammars define the domain, or context, within which the recognition engine
works.



The engine compares the current utterance against the words and phrases in the
active grammars.



If the user says something that is not in the grammar, the speech engine will not
be able to decipher it correctly.



You can define a single grammar for your application, or you may have multiple
grammars.



Chances are, you will have multiple grammars, and you will activate each
grammar only when it is needed.
Recognized Text


It is an output of this model that produces a likelihood if what user has
provided as input commonly known as recognized text.



This text can be processed in two way by the application using this model:


As a command



As a dictation text



As a command, a further processing is made upon the text which serves as
output



As a dictation text, a stream of text is shown as output as a formatted text
just like a text can be shown on screen.
Performance Measurement


The performance of a speech recognition system is measurable.



Perhaps the most widely used measurement is accuracy.



Arguably the most important measurement of accuracy is whether the desired
end result occurred.



This measurement is useful in validating application design.



For example, if the user said "yes," the engine returned "yes," and the "YES"
action was executed, it is clear that the desired end result was achieved.



But what happens if the engine returns text that does not exactly match the
utterance?



For example, what if the user said "nope," the engine returned "no," yet the
"NO" action was executed?
Standard


There are a lot of standards by World Wide Web Consortium (W3C)



Some of them are enlisted below:


VoiceXML



SRGS (Speech Recognition Grammar Specification)



SISR (Semantic Interpretation for Speech Recognition)



SSML (Speech Synthesis Markup Language)



PLS (Pronunciation Lexicon Specification)



CCXML (Call Control eXtensible Markup Language)



MediaCTRL



MSML (Media Server Markup Language)



MSCML (Media Server Control Markup Language)
Google search by voice: A case study


Mobile web search is a rapidly growing area of interest.



Internet-enabled Smart phones account for an increasing share of the mobile
devices sold throughout the world, and most models offer a web browsing
experience that rivals desktop computers in display quality.



Users are increasingly turning to their mobile devices when doing web
searches, driving efforts to enhance the usability of web search on these
devices.



Although mobile device usability has improved, typing search queries can still
be cumbersome, error-prone, and even dangerous in some usage scenarios.
GSbV – A screenshot
GSbV – An example

Figure 3. An example of a speech recognizer app GSbV
Conclusions


Speech recognition will revolutionize the way people conduct business over
the Web and will, ultimately, differentiate world-class e-businesses.



VoiceXML ties speech recognition and telephony together and provides the
technology with which businesses can develop and deploy voice-enabled Web
solutions TODAY!



These solutions can greatly expand the accessibility of Web-based self-service
transactions to customers who would otherwise not have access, and, at the
same time, leverage a business’ existing Web investments.



Speech recognition and VoiceXML clearly represent the next wave of the Web.
References



http://cmusphinx.sourceforge.net/wiki/tutorialconcepts



ftp://public.dhe.ibm.com/software/pervasive/info/products/Introduction_to
_Speech_Reco gnition.pdf



https://www.google.co.in/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&ved
=0CDAQFjAB&url=http%3A%2F%2Fresearch.google.com%2Fpubs%2Farchive%2F3
6340.pdf&ei=13GVUoW5CY6FrAeZ04GgDg&usg=AFQjCNEWeZqQk1pIkXjtzrNG__
ZR-UUNbA&bvm=bv.57155469,d.bmk&cad=rja



http://en.wikipedia.org/wiki/VoiceXML

More Related Content

What's hot

Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
Seminar Links
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice RecognitionAmrita More
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by Iqbal
Iqbal
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
Aamir-sheriff
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
Manthan Gandhi
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
Ankit Gujrati
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
Ahmed Moawad
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentation
himanshubhatti
 
A seminar report on speech recognition technology
A seminar report on speech recognition technologyA seminar report on speech recognition technology
A seminar report on speech recognition technology
SrijanKumar18
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
Richie
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition
Goa App
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Unit 1 speech processing
Unit 1 speech processingUnit 1 speech processing
Unit 1 speech processing
azhagujaisudhan
 
Speech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law compandingSpeech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law companding
iosrjce
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
Charu Joshi
 
speech processing and recognition basic in data mining
speech processing and recognition basic in  data miningspeech processing and recognition basic in  data mining
speech processing and recognition basic in data mining
Jimit Rupani
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
Alok Tiwari
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition systemAlok Tiwari
 
Visual speech to text conversion applicable to telephone communication
Visual speech to text conversion  applicable  to telephone communicationVisual speech to text conversion  applicable  to telephone communication
Visual speech to text conversion applicable to telephone communication
Swathi Venugopal
 
Speech recognition techniques
Speech recognition techniquesSpeech recognition techniques
Speech recognition techniques
sonukumar142
 

What's hot (20)

Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by Iqbal
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentation
 
A seminar report on speech recognition technology
A seminar report on speech recognition technologyA seminar report on speech recognition technology
A seminar report on speech recognition technology
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Unit 1 speech processing
Unit 1 speech processingUnit 1 speech processing
Unit 1 speech processing
 
Speech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law compandingSpeech to text conversion for visually impaired person using µ law companding
Speech to text conversion for visually impaired person using µ law companding
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
speech processing and recognition basic in data mining
speech processing and recognition basic in  data miningspeech processing and recognition basic in  data mining
speech processing and recognition basic in data mining
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
 
Visual speech to text conversion applicable to telephone communication
Visual speech to text conversion  applicable  to telephone communicationVisual speech to text conversion  applicable  to telephone communication
Visual speech to text conversion applicable to telephone communication
 
Speech recognition techniques
Speech recognition techniquesSpeech recognition techniques
Speech recognition techniques
 

Viewers also liked

Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by Iqbal
Iqbal
 
Artificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemArtificial intelligence Speech recognition system
Artificial intelligence Speech recognition system
REHMAT ULLAH
 
Speech recognition project report
Speech recognition project reportSpeech recognition project report
Speech recognition project reportSarang Afle
 
Dev Days, Speech Recognition, LM Aubert
Dev Days, Speech Recognition, LM AubertDev Days, Speech Recognition, LM Aubert
Dev Days, Speech Recognition, LM Aubert
aubertlm
 
Blackboard Pattern
Blackboard PatternBlackboard Pattern
Blackboard Patterntcab22
 
Blackboard architecture pattern
Blackboard architecture patternBlackboard architecture pattern
Blackboard architecture pattern
aish006
 
The Main Concepts of Speech Recognition
The Main Concepts of Speech RecognitionThe Main Concepts of Speech Recognition
The Main Concepts of Speech Recognition子毅 楊
 
fundamentals of speech recognition
fundamentals of speech recognitionfundamentals of speech recognition
fundamentals of speech recognition
Venkat RAGHAVENDRA REDDY
 
Rajul computer presentation
Rajul computer presentationRajul computer presentation
Rajul computer presentationNeetu Jain
 
Speech recognition challenges
Speech recognition challengesSpeech recognition challenges
Speech recognition challengesAlexandru Chica
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Chiranjeevi Adi
 
Applying Blackboard Systems to First Person Shooters
Applying Blackboard Systems to First Person ShootersApplying Blackboard Systems to First Person Shooters
Applying Blackboard Systems to First Person Shootershbbalfred
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognitionboddu syamprasad
 
AUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYAUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEY
IJCERT
 

Viewers also liked (16)

Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by Iqbal
 
Artificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemArtificial intelligence Speech recognition system
Artificial intelligence Speech recognition system
 
An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
An Introduction To Speech Recognition
 
Speech recognition project report
Speech recognition project reportSpeech recognition project report
Speech recognition project report
 
Dev Days, Speech Recognition, LM Aubert
Dev Days, Speech Recognition, LM AubertDev Days, Speech Recognition, LM Aubert
Dev Days, Speech Recognition, LM Aubert
 
Blackboard Pattern
Blackboard PatternBlackboard Pattern
Blackboard Pattern
 
Blackboard architecture pattern
Blackboard architecture patternBlackboard architecture pattern
Blackboard architecture pattern
 
The Main Concepts of Speech Recognition
The Main Concepts of Speech RecognitionThe Main Concepts of Speech Recognition
The Main Concepts of Speech Recognition
 
fundamentals of speech recognition
fundamentals of speech recognitionfundamentals of speech recognition
fundamentals of speech recognition
 
Rajul computer presentation
Rajul computer presentationRajul computer presentation
Rajul computer presentation
 
Speech recognition challenges
Speech recognition challengesSpeech recognition challenges
Speech recognition challenges
 
Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks Hand Written Character Recognition Using Neural Networks
Hand Written Character Recognition Using Neural Networks
 
Applying Blackboard Systems to First Person Shooters
Applying Blackboard Systems to First Person ShootersApplying Blackboard Systems to First Person Shooters
Applying Blackboard Systems to First Person Shooters
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
AUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYAUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEY
 

Similar to Speech recognition an overview

Tutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemTutorial - Speech Synthesis System
Tutorial - Speech Synthesis System
IJERA Editor
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
IRJET- Voice based Billing System
IRJET-  	  Voice based Billing SystemIRJET-  	  Voice based Billing System
IRJET- Voice based Billing System
IRJET Journal
 
ACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITIONACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITION
ijistjournal
 
Paper on Speech Recognition
Paper on Speech RecognitionPaper on Speech Recognition
Paper on Speech RecognitionThejus Joby
 
Deciphering voice of customer through speech analytics
Deciphering voice of customer through speech analyticsDeciphering voice of customer through speech analytics
Deciphering voice of customer through speech analytics
R Systems International
 
Instant speech translation 10BM60080 - VGSOM
Instant speech translation   10BM60080 - VGSOMInstant speech translation   10BM60080 - VGSOM
Instant speech translation 10BM60080 - VGSOM
sathiyaseelanm
 
NLP and its applications
NLP and its applicationsNLP and its applications
NLP and its applicationsUtphala P
 
Developing a hands-free interface to operate a Computer using voice command
Developing a hands-free interface to operate a Computer using voice commandDeveloping a hands-free interface to operate a Computer using voice command
Developing a hands-free interface to operate a Computer using voice command
Mohammad Liton Hossain
 
Allin Qillqay A Free On-Line Web Spell Checking Service For Quechua
Allin Qillqay  A Free On-Line Web Spell Checking Service For QuechuaAllin Qillqay  A Free On-Line Web Spell Checking Service For Quechua
Allin Qillqay A Free On-Line Web Spell Checking Service For Quechua
Andrea Porter
 
NICE Speech Analytics - White Paper
NICE Speech Analytics - White PaperNICE Speech Analytics - White Paper
NICE Speech Analytics - White PaperClark Hill
 
Google Voice-to-text
Google Voice-to-textGoogle Voice-to-text
Google Voice-to-text
Trần Hữu Tuấn
 
Voice Recognition System using Template Matching
Voice Recognition System using Template MatchingVoice Recognition System using Template Matching
Voice Recognition System using Template Matching
IJORCS
 
Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server
Tulsa Techfest 2008 - Creating A Voice User Interface With Speech ServerTulsa Techfest 2008 - Creating A Voice User Interface With Speech Server
Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server
Jason Townsend, MBA
 
Stemming is one of several text normalization techniques that converts raw te...
Stemming is one of several text normalization techniques that converts raw te...Stemming is one of several text normalization techniques that converts raw te...
Stemming is one of several text normalization techniques that converts raw te...
NALESVPMEngg
 
EasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdf
NohaGhoweil
 
NLP Techniques for Speech Recognition.docx
NLP Techniques for Speech Recognition.docxNLP Techniques for Speech Recognition.docx
NLP Techniques for Speech Recognition.docx
KevinSims18
 

Similar to Speech recognition an overview (20)

Seminar
SeminarSeminar
Seminar
 
Tutorial - Speech Synthesis System
Tutorial - Speech Synthesis SystemTutorial - Speech Synthesis System
Tutorial - Speech Synthesis System
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
IRJET- Voice based Billing System
IRJET-  	  Voice based Billing SystemIRJET-  	  Voice based Billing System
IRJET- Voice based Billing System
 
ACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITIONACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITION
 
Paper on Speech Recognition
Paper on Speech RecognitionPaper on Speech Recognition
Paper on Speech Recognition
 
Deciphering voice of customer through speech analytics
Deciphering voice of customer through speech analyticsDeciphering voice of customer through speech analytics
Deciphering voice of customer through speech analytics
 
Instant speech translation 10BM60080 - VGSOM
Instant speech translation   10BM60080 - VGSOMInstant speech translation   10BM60080 - VGSOM
Instant speech translation 10BM60080 - VGSOM
 
NLP and its applications
NLP and its applicationsNLP and its applications
NLP and its applications
 
Developing a hands-free interface to operate a Computer using voice command
Developing a hands-free interface to operate a Computer using voice commandDeveloping a hands-free interface to operate a Computer using voice command
Developing a hands-free interface to operate a Computer using voice command
 
Allin Qillqay A Free On-Line Web Spell Checking Service For Quechua
Allin Qillqay  A Free On-Line Web Spell Checking Service For QuechuaAllin Qillqay  A Free On-Line Web Spell Checking Service For Quechua
Allin Qillqay A Free On-Line Web Spell Checking Service For Quechua
 
NICE Speech Analytics - White Paper
NICE Speech Analytics - White PaperNICE Speech Analytics - White Paper
NICE Speech Analytics - White Paper
 
Google Voice-to-text
Google Voice-to-textGoogle Voice-to-text
Google Voice-to-text
 
10
1010
10
 
Ijetcas14 390
Ijetcas14 390Ijetcas14 390
Ijetcas14 390
 
Voice Recognition System using Template Matching
Voice Recognition System using Template MatchingVoice Recognition System using Template Matching
Voice Recognition System using Template Matching
 
Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server
Tulsa Techfest 2008 - Creating A Voice User Interface With Speech ServerTulsa Techfest 2008 - Creating A Voice User Interface With Speech Server
Tulsa Techfest 2008 - Creating A Voice User Interface With Speech Server
 
Stemming is one of several text normalization techniques that converts raw te...
Stemming is one of several text normalization techniques that converts raw te...Stemming is one of several text normalization techniques that converts raw te...
Stemming is one of several text normalization techniques that converts raw te...
 
EasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdfEasyChair-Preprint-7375.pdf
EasyChair-Preprint-7375.pdf
 
NLP Techniques for Speech Recognition.docx
NLP Techniques for Speech Recognition.docxNLP Techniques for Speech Recognition.docx
NLP Techniques for Speech Recognition.docx
 

More from Varun Jain

Mobile devices
Mobile devicesMobile devices
Mobile devices
Varun Jain
 
Java in web 2 0 presentation
Java in web 2 0 presentationJava in web 2 0 presentation
Java in web 2 0 presentationVarun Jain
 
Ready transmesia media
Ready transmesia mediaReady transmesia media
Ready transmesia mediaVarun Jain
 
Computer communications and networks
Computer communications and networksComputer communications and networks
Computer communications and networksVarun Jain
 
The brain computing
The brain computingThe brain computing
The brain computingVarun Jain
 
Data warehousing
Data warehousingData warehousing
Data warehousingVarun Jain
 
Billing software
Billing softwareBilling software
Billing softwareVarun Jain
 
Making power point slides
Making power point slidesMaking power point slides
Making power point slidesVarun Jain
 

More from Varun Jain (8)

Mobile devices
Mobile devicesMobile devices
Mobile devices
 
Java in web 2 0 presentation
Java in web 2 0 presentationJava in web 2 0 presentation
Java in web 2 0 presentation
 
Ready transmesia media
Ready transmesia mediaReady transmesia media
Ready transmesia media
 
Computer communications and networks
Computer communications and networksComputer communications and networks
Computer communications and networks
 
The brain computing
The brain computingThe brain computing
The brain computing
 
Data warehousing
Data warehousingData warehousing
Data warehousing
 
Billing software
Billing softwareBilling software
Billing software
 
Making power point slides
Making power point slidesMaking power point slides
Making power point slides
 

Recently uploaded

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 

Recently uploaded (20)

GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 

Speech recognition an overview

  • 1. Speech Recognition An Overview Submitted To: Submitted By: Name: Dr. M Syamala Devi Name: Varun Jain Designation: Professor Roll No.: 34 Department: DCSA, Panjab University, Chandigarh Class: MCA 5th Semester (M) Panjab University, Chandigarh
  • 2. Table of Contents  Introduction  Introduction – A closer look  Terms and Concepts in Speech Recognition   Pronunciation   Utterances Grammar How it works  Acoustic Model  Grammar  Recognized Text  Performance Measurement  Standards  Google Search by Voice (GSbV) – A case study  GSbV – A screenshot  GSbV – An example  Conclusions  References
  • 3. Introduction  Have you ever talked to your computer?  I mean, have you really, really talked to your computer?  Where it actually recognized what you said and then did something as a result?  If you have, then you've used a technology known as speech recognition.  It is the process of converting spoken input to text.  Speech recognition allows you to provide input to an application with your voice.  Speech recognition is thus sometimes referred to as speech-to-text.  In the desktop world, you need a microphone to be able to do this.  The speech recognition process is performed by a software component known as the speech recognition engine.  The primary function of the speech recognition engine is to process spoken input and translate it into text that an application understands.
  • 4. Introduction – A closer look  Upon hearing a voice as input, a speech recognition application can do one of the following thing:    The application can interpret the result of the recognition as a command. In this case, the application is a command and control application. An example of a command and control application is one in which the caller says “check balance”, and the application returns the current balance of the caller’s account. If an application handles the recognized text simply as text, then it is considered a dictation application. In a dictation application, if you said “check balance,” the application would not interpret the result, but simply return the text “check balance”. For example, in response to hearing "Please say coffee, tea, or milk," you say "coffee" and the speech recognition application you're calling tells you what the flavor of the day is and then asks if you'd like to place an order.
  • 5. Terms and Concepts in Speech Recognition  There are basically three terms that are majorly used in context of speech recognition concept.  These three are:  Utterance  Pronunciation  Grammar
  • 6. Utterance  When the user says something, this is known as an utterance.  An utterance is any stream of speech between two periods of silence.  Utterances are sent to the speech engine to be processed.  Silence, in speech recognition, is almost as important as what is spoken, because silence delineates the start and end of an utterance.  The speech recognition engine is "listening" for speech input.  When the engine detects audio input - in other words, a lack of silence -- the beginning of an utterance is signaled.  Similarly, when the engine detects a certain amount of silence following the audio, the end of the utterance occurs.  If the user doesn’t say anything, the engine returns what is known as a silence timeout - an indication that there was no speech detected within the expected timeframe, and the application takes an appropriate action, such as re-prompting the user for input.
  • 7. Pronunciation  The speech recognition engine uses all sorts of data, statistical models, and algorithms to convert spoken input into text.  One piece of information that the speech recognition engine uses to process a word is its pronunciation, which represents what the speech engine thinks a word should sound like.  Words can have multiple pronunciations associated with them.  For example, the word “the” has at least two pronunciations in the U.S. English language: “thee” and “thuh.”
  • 8. Grammar  One must specify the words and phrases that users can say to your application.  These words and phrases are defined to the speech recognition engine and are used in the recognition process.  You can specify the valid words and phrases in a number of different ways, but in application, you do this by specifying a grammar.  A grammar uses a particular syntax, or set of rules, to define the words and phrases that can be recognized by the engine.  A grammar can be as simple as a list of words, or it can be flexible enough to allow such variability in what can be said that it approaches natural language capability.
  • 9. How it works Fig. 1 A basic working model of speech recognition depicting how a general speech recognition works.
  • 10. Acoustic Model  Acoustic models provide an estimate for the likelihood of the observed features in a frame of speech given a particular phonetic context.  The features are typically related to measurements of the spectral characteristics of a time-slice of speech.  While individual recipes for training acoustic models vary in their structure and sequencing, the basic process involves aligning transcribed speech to states within an existing acoustic model, accumulating frames associated with each state, and re-estimating the probability distributions associated with the state, given the features observed in those frames.
  • 11. Grammar  A grammar can be as simple as a list of words, or it can be flexible enough to allow such variability in what can be said that it approaches natural language capability.  Grammars define the domain, or context, within which the recognition engine works.  The engine compares the current utterance against the words and phrases in the active grammars.  If the user says something that is not in the grammar, the speech engine will not be able to decipher it correctly.  You can define a single grammar for your application, or you may have multiple grammars.  Chances are, you will have multiple grammars, and you will activate each grammar only when it is needed.
  • 12. Recognized Text  It is an output of this model that produces a likelihood if what user has provided as input commonly known as recognized text.  This text can be processed in two way by the application using this model:  As a command  As a dictation text  As a command, a further processing is made upon the text which serves as output  As a dictation text, a stream of text is shown as output as a formatted text just like a text can be shown on screen.
  • 13. Performance Measurement  The performance of a speech recognition system is measurable.  Perhaps the most widely used measurement is accuracy.  Arguably the most important measurement of accuracy is whether the desired end result occurred.  This measurement is useful in validating application design.  For example, if the user said "yes," the engine returned "yes," and the "YES" action was executed, it is clear that the desired end result was achieved.  But what happens if the engine returns text that does not exactly match the utterance?  For example, what if the user said "nope," the engine returned "no," yet the "NO" action was executed?
  • 14. Standard  There are a lot of standards by World Wide Web Consortium (W3C)  Some of them are enlisted below:  VoiceXML  SRGS (Speech Recognition Grammar Specification)  SISR (Semantic Interpretation for Speech Recognition)  SSML (Speech Synthesis Markup Language)  PLS (Pronunciation Lexicon Specification)  CCXML (Call Control eXtensible Markup Language)  MediaCTRL  MSML (Media Server Markup Language)  MSCML (Media Server Control Markup Language)
  • 15. Google search by voice: A case study  Mobile web search is a rapidly growing area of interest.  Internet-enabled Smart phones account for an increasing share of the mobile devices sold throughout the world, and most models offer a web browsing experience that rivals desktop computers in display quality.  Users are increasingly turning to their mobile devices when doing web searches, driving efforts to enhance the usability of web search on these devices.  Although mobile device usability has improved, typing search queries can still be cumbersome, error-prone, and even dangerous in some usage scenarios.
  • 16. GSbV – A screenshot
  • 17. GSbV – An example Figure 3. An example of a speech recognizer app GSbV
  • 18. Conclusions  Speech recognition will revolutionize the way people conduct business over the Web and will, ultimately, differentiate world-class e-businesses.  VoiceXML ties speech recognition and telephony together and provides the technology with which businesses can develop and deploy voice-enabled Web solutions TODAY!  These solutions can greatly expand the accessibility of Web-based self-service transactions to customers who would otherwise not have access, and, at the same time, leverage a business’ existing Web investments.  Speech recognition and VoiceXML clearly represent the next wave of the Web.