1. WavFrag Voice Recognition
For instance, recognizing the user 's intent from speech
is still in its infancy. To address these shortcomings, the
WavFrag project was born.
WFR
In today's market, the man / machine
interface is evolving to be increasingly
sophisticated, and more friendly to the
user. However, there are still significant
technological challenges to overcome.
(C) RobotMonkeySoftware LLC, 2010
2. Addressing the Challenge
There is no doubt, that even with modern technology,
our household pet recognizes more words than most of
today's computers. And to boast, our pet recognizes
them more accurately.
Looking at the inaccuracies
and the processing overhead,
the question arises:
Are we overlooking
something in our science?
3. To Boldly Go ...
And so the WavFrag project was born. It attempts
to answer this very question through state of the art
research and development.
With computing capacity, measured in billions of
calculations a second, and with high speed storage
capability also in the giga range of units, one would
assume that technology would outperform
the household pet.
Unfortunately, even with all these
advances, one puzzle of the piece is
missing:
- The Right Algorithm -
4. The Heart of the Matter
For instance, the issue of detecting a (spoken) phrase
completion. In order to ascertain that a phrase is
complete, one needs to assert the full recognition
process. But in order to complete the recognition process,
one needs to frame the phrase first.
Much like the proverbial chicken and egg situation, we
cannot overcome this duality by our current way of
comprehension.
After a year of research, we identified the weak points of
the voice recognition chain. Most of them are philosophical.
5. New Way of Thinking
The WavFrag project introduces a new way of thinking,
with considerable success. At its current stage of
development, it is capable of running real time on a sub
GHz machine with a sizable dictionary.
The recognition accuracy also represents a
leap forward. For instance, speaking into
a webcam microphone from 6 feet away
will yield full recognition accuracy.
Currently, no other algorithm exhibits
this kind of a performance and
adaptation capability.
6. Training
The trainability of WavFrag also represents a
breakthrough. One can train a new phrase, or even
a whole sentence with one or two pronunciation(s)
of that phrase / sentence.
The trained word is recognized with
extreme accuracy. WavFrag can easily
distinguish the word 'tree' from the
number 'three'. At the same time, it
can generalize those words to be
recognized by different speakers.
And here is where the new thinking helps.
7. The Doughnut
There is a popular saying that goes something like this:
Concentrate on the doughnut, not the doughnut hole.
It refers to an issue of focus, but applied
to our problem domain, it opened up
several new frontiers. Like framing the
uttering(s) by their high power cycle,
(Presently noise is observed for framing)
or discarding the lesser dominant
spectral components. (This is how
our brain does its magic)
8. The New Journey
Although WavFrag development is still in its infancy, it
already outperforms many of the traditional algorithms both
in accuracy and trainabilty. (All that by using a fraction of the
processing power, and no neural net)
But the fun just started. WavFrag indeed
opened up a new frontier. It can run
on most embedded systems, and
porting it is easy. In its current form,
it is suitable for many tasks like
controlling non-mission critical
tasks in a cockpit.
WFR
9. Applications
The applications are endless. Without much investigation,
we quickly found several obvious deployment avenues like
commanding a medical workstation, enhancing game play
experience, or just controlling a device.
Because we are breaking a new frontier, it is difficult to
envision the breath and the scope of the difficulties, and the
matching rewards. Given our current state of development,
WavFrag may start a revolution in user interface.
Admittedly, there is a bumpy road ahead, but our results
are so encouraging, we dedicated significant resources to
see it through.
10. Strengths and Advantages
Accurate
Uses less Processing Power
Easy to train
Small Memory footprint
Coded in Standard C++
Easy to Port
Future proof
Available as IP
11. Next Steps of Action
RobotMonkeySoftware LLC
WavFrag has reached a milestone, where it is ready for
deployment. We are currently seeking customers who
would like to incorporate WavFrag into their product or
product line. WavFrag is available as Intellectual
Property (IP), Code Library, or Source Code and Training.
Naturally, more resources are needed for developing
WafeFrag's next generation. We welcome investors who
would like to participate in this exciting frontier.
12. Contact Information
The author of WavFrag can be contacted at:
PeterGlen@verizon.net
The claims that where made in this presentation can be
verified with our demonstration program. It is available as
a free download, at RobotMonkeySoftware.com.
Please follow the voice recognition links for documentation
and download.
Document Prepared by RobotMonkeySoftware LLC, with OpenOffice