SlideShare a Scribd company logo
1 of 12
Voice Recognition (Presentation 2)
By:
Priya Devi A.
S/W Developer,
Xsys technologies
Bangalore.
Preparing Grammar
 Grammar file currently extended to 56 tokens.
 Dynamic generation of grammar file is possible.
 User Interface for entering grammar token and
action is implemented.
 Tokens are entered into grammar file which are
recognized by sphinx recognizer on detection from
microphone input.
 Action are associated to tokens and recorded in
form of hash table.
 Grammar file is according to JSGF format.
JSGF (Java Speech Grammar Format)
 The JSpeech Grammar Format (JSGF) is a
platform-independent, vendor-independent textual
representation of grammars for use in speech
recognition.
 Example token definition according to JSGF is as
follows :
public <desktopAction> = open (Computer | Document | Recycle |
Network | <defaultApplication> );
public <defaultApplication> = player | word | powerpoint | internet |
start | tasks ;
Major Challenge - Accuracy
 Accuracy now is only 45 %.
 Accuracy depends on a lot of factors like noise,
microphone quality.
 Accuracy highly depends on Recognizer.
 Recognizer search grammar file for tokens
according to Best first scheme.
 Best first scheme fails due to wrong textual
comparison. For eg. Word can be recognized as
ward.
Improving Accuracy
 Limit the size of grammar file.
 Remove trivial tokens from grammar file.
 All the tokens given on slide 3 are trivial tokens.
 Trivial tokens can be identified by .WAV file training
and not included in grammar file.
 Which reduces search space of grammar file.
 Accuracy is increased to 72 %
 With this command and control application is
completed.
.WAV file training
 .Wav file training is process of recording small .wav
files in user’s voice to improve accuracy in speech
recognition application.
 User are provided with the interface to read set of
lines before starting with the speech recognition
application.
 Set of lines consists of words which are trivial for
command and control application like , open, close,
file, computer, document, player, internet.
 Recognizer first match token with .wav file. If token
is not found in .wav file the grammar file is
searched.
Next task : Dictation
 Dictation is different from command and control. It
requires large number of words to be recognized.
 Dictation should be start on recognizing “Start dictation”
token and then input from microphone should not be
used as command but as keystrokes.
 Complex task as grammar file and .wav file training fails
in this case because user can speak anything which may
be not present in grammar file and .wav files.
Thank You
Voice Recognition (Presentation 3)
By:
Priya Devi A.
S/W Developer,
Xsys technologies
Bangalore.
Dictation Functionality
 Speech dictation is to consider input voice not as
command but as text.
 Recognition of spoken word is similar to as it was in
command and control application.
 Once the spoken word is recognized as “Start Dictation”;
Rest all word is considered as text till recognizer
recognizes “Stop Dictation”.
 After recognizing “Stop Dictation” ; application again will
work as command and control
 Dictation is implemented by using algorithm given in the
next slide.
Algorithm Dictation
Changes in Command and control
If ( Recognizer(spoken_word)= “Start Dictation” )
call function RecognizeDictation()
else
match in hashtable.
Recognize Dictation
While(true)
Start Recording
If ( Recognizer(spoken_word) != “Stop Dictation” )
Create object of Robot Class present in java.awt package
for i=0 to Recognizer(spoken_word).length-1
RobotObject.keyPress(recognizeword.charAt(i).toAscii())
RobotObject.keyRelease(recognizeword.charAt(i).toAscii())
End for
Else
return
End While
Open Points
 Paragraph framing for training .wav files
 Modification in dictation functionality as “Stop Dictation”
can not be dictated.
 Proper GUI creation with logo and standard design.
 Deployment with the existing system on centos.
 Testing on centos.
 Code Cleanup.
 Complete Testing of command and control and Dictation
 Documentation.

More Related Content

Similar to 49532873-Voice-Recognition.ppt

Paper on Speech Recognition
Paper on Speech RecognitionPaper on Speech Recognition
Paper on Speech RecognitionThejus Joby
 
Sybsc cs sem 3 core java
Sybsc cs sem 3 core javaSybsc cs sem 3 core java
Sybsc cs sem 3 core javaWE-IT TUTORIALS
 
Classes and Objects
Classes and ObjectsClasses and Objects
Classes and Objectsvmadan89
 
Code Camp - Presentation - Windows 10 - (Cortana)
Code Camp - Presentation - Windows 10 - (Cortana)Code Camp - Presentation - Windows 10 - (Cortana)
Code Camp - Presentation - Windows 10 - (Cortana)Edward Moemeka
 
Gwt and JSR 269's Pluggable Annotation Processing API
Gwt and JSR 269's Pluggable Annotation Processing APIGwt and JSR 269's Pluggable Annotation Processing API
Gwt and JSR 269's Pluggable Annotation Processing APIArnaud Tournier
 
(Ebook pdf) java programming language basics
(Ebook pdf)   java programming language basics(Ebook pdf)   java programming language basics
(Ebook pdf) java programming language basicsRaffaella D'angelo
 
4Developers 2015: Talking and listening to web pages - Aurelio De Rosa
4Developers 2015: Talking and listening to web pages - Aurelio De Rosa4Developers 2015: Talking and listening to web pages - Aurelio De Rosa
4Developers 2015: Talking and listening to web pages - Aurelio De RosaPROIDEA
 
Java and Related Technologies
Java and Related TechnologiesJava and Related Technologies
Java and Related TechnologiesQualys
 
Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overviewVarun Jain
 
Web Test Automation Framework - IndicThreads Conference
Web Test Automation Framework  - IndicThreads ConferenceWeb Test Automation Framework  - IndicThreads Conference
Web Test Automation Framework - IndicThreads ConferenceIndicThreads
 
Scalable Real Time Chat (Text, Audio, Video) - Implemented using XMPP
Scalable Real Time Chat (Text, Audio, Video) - Implemented using XMPPScalable Real Time Chat (Text, Audio, Video) - Implemented using XMPP
Scalable Real Time Chat (Text, Audio, Video) - Implemented using XMPPUdaya Kiran
 
Speech recognizers & generators
Speech recognizers & generatorsSpeech recognizers & generators
Speech recognizers & generatorsPaul Kahoro
 
Smart acceptance GUI tests with Selenium
Smart acceptance GUI tests with SeleniumSmart acceptance GUI tests with Selenium
Smart acceptance GUI tests with SeleniumDenys Zaiats
 
Java lab1 manual
Java lab1 manualJava lab1 manual
Java lab1 manualnahalomar
 
Introduction to java
Introduction to javaIntroduction to java
Introduction to javashwanjava
 

Similar to 49532873-Voice-Recognition.ppt (20)

Paper on Speech Recognition
Paper on Speech RecognitionPaper on Speech Recognition
Paper on Speech Recognition
 
Sybsc cs sem 3 core java
Sybsc cs sem 3 core javaSybsc cs sem 3 core java
Sybsc cs sem 3 core java
 
Classes and Objects
Classes and ObjectsClasses and Objects
Classes and Objects
 
Java programming language basics
Java programming language basicsJava programming language basics
Java programming language basics
 
Code Camp - Presentation - Windows 10 - (Cortana)
Code Camp - Presentation - Windows 10 - (Cortana)Code Camp - Presentation - Windows 10 - (Cortana)
Code Camp - Presentation - Windows 10 - (Cortana)
 
Gwt and JSR 269's Pluggable Annotation Processing API
Gwt and JSR 269's Pluggable Annotation Processing APIGwt and JSR 269's Pluggable Annotation Processing API
Gwt and JSR 269's Pluggable Annotation Processing API
 
Robot framework
Robot frameworkRobot framework
Robot framework
 
(Ebook pdf) java programming language basics
(Ebook pdf)   java programming language basics(Ebook pdf)   java programming language basics
(Ebook pdf) java programming language basics
 
Why TOMOYO Linux?
Why TOMOYO Linux?Why TOMOYO Linux?
Why TOMOYO Linux?
 
4Developers 2015: Talking and listening to web pages - Aurelio De Rosa
4Developers 2015: Talking and listening to web pages - Aurelio De Rosa4Developers 2015: Talking and listening to web pages - Aurelio De Rosa
4Developers 2015: Talking and listening to web pages - Aurelio De Rosa
 
Java and Related Technologies
Java and Related TechnologiesJava and Related Technologies
Java and Related Technologies
 
JAVA Program Examples
JAVA Program ExamplesJAVA Program Examples
JAVA Program Examples
 
Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overview
 
3.2
3.23.2
3.2
 
Web Test Automation Framework - IndicThreads Conference
Web Test Automation Framework  - IndicThreads ConferenceWeb Test Automation Framework  - IndicThreads Conference
Web Test Automation Framework - IndicThreads Conference
 
Scalable Real Time Chat (Text, Audio, Video) - Implemented using XMPP
Scalable Real Time Chat (Text, Audio, Video) - Implemented using XMPPScalable Real Time Chat (Text, Audio, Video) - Implemented using XMPP
Scalable Real Time Chat (Text, Audio, Video) - Implemented using XMPP
 
Speech recognizers & generators
Speech recognizers & generatorsSpeech recognizers & generators
Speech recognizers & generators
 
Smart acceptance GUI tests with Selenium
Smart acceptance GUI tests with SeleniumSmart acceptance GUI tests with Selenium
Smart acceptance GUI tests with Selenium
 
Java lab1 manual
Java lab1 manualJava lab1 manual
Java lab1 manual
 
Introduction to java
Introduction to javaIntroduction to java
Introduction to java
 

Recently uploaded

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 

Recently uploaded (20)

Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 

49532873-Voice-Recognition.ppt

  • 1. Voice Recognition (Presentation 2) By: Priya Devi A. S/W Developer, Xsys technologies Bangalore.
  • 2. Preparing Grammar  Grammar file currently extended to 56 tokens.  Dynamic generation of grammar file is possible.  User Interface for entering grammar token and action is implemented.  Tokens are entered into grammar file which are recognized by sphinx recognizer on detection from microphone input.  Action are associated to tokens and recorded in form of hash table.  Grammar file is according to JSGF format.
  • 3. JSGF (Java Speech Grammar Format)  The JSpeech Grammar Format (JSGF) is a platform-independent, vendor-independent textual representation of grammars for use in speech recognition.  Example token definition according to JSGF is as follows : public <desktopAction> = open (Computer | Document | Recycle | Network | <defaultApplication> ); public <defaultApplication> = player | word | powerpoint | internet | start | tasks ;
  • 4. Major Challenge - Accuracy  Accuracy now is only 45 %.  Accuracy depends on a lot of factors like noise, microphone quality.  Accuracy highly depends on Recognizer.  Recognizer search grammar file for tokens according to Best first scheme.  Best first scheme fails due to wrong textual comparison. For eg. Word can be recognized as ward.
  • 5. Improving Accuracy  Limit the size of grammar file.  Remove trivial tokens from grammar file.  All the tokens given on slide 3 are trivial tokens.  Trivial tokens can be identified by .WAV file training and not included in grammar file.  Which reduces search space of grammar file.  Accuracy is increased to 72 %  With this command and control application is completed.
  • 6. .WAV file training  .Wav file training is process of recording small .wav files in user’s voice to improve accuracy in speech recognition application.  User are provided with the interface to read set of lines before starting with the speech recognition application.  Set of lines consists of words which are trivial for command and control application like , open, close, file, computer, document, player, internet.  Recognizer first match token with .wav file. If token is not found in .wav file the grammar file is searched.
  • 7. Next task : Dictation  Dictation is different from command and control. It requires large number of words to be recognized.  Dictation should be start on recognizing “Start dictation” token and then input from microphone should not be used as command but as keystrokes.  Complex task as grammar file and .wav file training fails in this case because user can speak anything which may be not present in grammar file and .wav files.
  • 9. Voice Recognition (Presentation 3) By: Priya Devi A. S/W Developer, Xsys technologies Bangalore.
  • 10. Dictation Functionality  Speech dictation is to consider input voice not as command but as text.  Recognition of spoken word is similar to as it was in command and control application.  Once the spoken word is recognized as “Start Dictation”; Rest all word is considered as text till recognizer recognizes “Stop Dictation”.  After recognizing “Stop Dictation” ; application again will work as command and control  Dictation is implemented by using algorithm given in the next slide.
  • 11. Algorithm Dictation Changes in Command and control If ( Recognizer(spoken_word)= “Start Dictation” ) call function RecognizeDictation() else match in hashtable. Recognize Dictation While(true) Start Recording If ( Recognizer(spoken_word) != “Stop Dictation” ) Create object of Robot Class present in java.awt package for i=0 to Recognizer(spoken_word).length-1 RobotObject.keyPress(recognizeword.charAt(i).toAscii()) RobotObject.keyRelease(recognizeword.charAt(i).toAscii()) End for Else return End While
  • 12. Open Points  Paragraph framing for training .wav files  Modification in dictation functionality as “Stop Dictation” can not be dictated.  Proper GUI creation with logo and standard design.  Deployment with the existing system on centos.  Testing on centos.  Code Cleanup.  Complete Testing of command and control and Dictation  Documentation.