2. Preparing Grammar
Grammar file currently extended to 56 tokens.
Dynamic generation of grammar file is possible.
User Interface for entering grammar token and
action is implemented.
Tokens are entered into grammar file which are
recognized by sphinx recognizer on detection from
microphone input.
Action are associated to tokens and recorded in
form of hash table.
Grammar file is according to JSGF format.
3. JSGF (Java Speech Grammar Format)
The JSpeech Grammar Format (JSGF) is a
platform-independent, vendor-independent textual
representation of grammars for use in speech
recognition.
Example token definition according to JSGF is as
follows :
public <desktopAction> = open (Computer | Document | Recycle |
Network | <defaultApplication> );
public <defaultApplication> = player | word | powerpoint | internet |
start | tasks ;
4. Major Challenge - Accuracy
Accuracy now is only 45 %.
Accuracy depends on a lot of factors like noise,
microphone quality.
Accuracy highly depends on Recognizer.
Recognizer search grammar file for tokens
according to Best first scheme.
Best first scheme fails due to wrong textual
comparison. For eg. Word can be recognized as
ward.
5. Improving Accuracy
Limit the size of grammar file.
Remove trivial tokens from grammar file.
All the tokens given on slide 3 are trivial tokens.
Trivial tokens can be identified by .WAV file training
and not included in grammar file.
Which reduces search space of grammar file.
Accuracy is increased to 72 %
With this command and control application is
completed.
6. .WAV file training
.Wav file training is process of recording small .wav
files in user’s voice to improve accuracy in speech
recognition application.
User are provided with the interface to read set of
lines before starting with the speech recognition
application.
Set of lines consists of words which are trivial for
command and control application like , open, close,
file, computer, document, player, internet.
Recognizer first match token with .wav file. If token
is not found in .wav file the grammar file is
searched.
7. Next task : Dictation
Dictation is different from command and control. It
requires large number of words to be recognized.
Dictation should be start on recognizing “Start dictation”
token and then input from microphone should not be
used as command but as keystrokes.
Complex task as grammar file and .wav file training fails
in this case because user can speak anything which may
be not present in grammar file and .wav files.
10. Dictation Functionality
Speech dictation is to consider input voice not as
command but as text.
Recognition of spoken word is similar to as it was in
command and control application.
Once the spoken word is recognized as “Start Dictation”;
Rest all word is considered as text till recognizer
recognizes “Stop Dictation”.
After recognizing “Stop Dictation” ; application again will
work as command and control
Dictation is implemented by using algorithm given in the
next slide.
11. Algorithm Dictation
Changes in Command and control
If ( Recognizer(spoken_word)= “Start Dictation” )
call function RecognizeDictation()
else
match in hashtable.
Recognize Dictation
While(true)
Start Recording
If ( Recognizer(spoken_word) != “Stop Dictation” )
Create object of Robot Class present in java.awt package
for i=0 to Recognizer(spoken_word).length-1
RobotObject.keyPress(recognizeword.charAt(i).toAscii())
RobotObject.keyRelease(recognizeword.charAt(i).toAscii())
End for
Else
return
End While
12. Open Points
Paragraph framing for training .wav files
Modification in dictation functionality as “Stop Dictation”
can not be dictated.
Proper GUI creation with logo and standard design.
Deployment with the existing system on centos.
Testing on centos.
Code Cleanup.
Complete Testing of command and control and Dictation
Documentation.