o Short answer- YES!o Most failed voice implementations can be traced to the voice recognizer software that “listens to” and interprets what the work says.o Ignoring the quality of the voice recognition software can lead to a failed implementation
Dozens of recognizers on the market today that are designed for a “controlled” environment (little to no background noise) Warehouses are the most challenging places for a voice recognizer to work Warehouses also employ a diverse workforce with different native tongues and accents If recognizer makes a mistake, worker must repeat himself and productivity suffers
o Voxware offers 99.9% recognition accuracy with their voice technologyo Voxware Integrated Speech Engine (VISE) designed to operate in very noise settings without compromising accuracy. VISE has been refined over 25 yearso Voxware ensures consistently high recognition rates regardless of mobile device being used, language or environmental circumstances
o In an apparel warehouse near Atlanta, a Voxware voice solution accurately recognizes workers who speak five different languages: English, Spanish, Bosnian, Vietnamese, and Somali.
Speaker Dependent Recognizers: recognizers are “trained” to recognize the way a specific user says a vocabulary of words. Speaker independent recognizers do not require training. Speaker independent recognizers are widely used for customer service applications (e.g. airline reservations)
o VISE leads users through a training session and creates a voice profile for each person that is specific to that person’s way of speaking (e.g. accents)o Speaker dependent recognizers account for an increase in accuracy from 95% to 99.9%. This difference is huge in terms of ROI from voice implementation.o Training is time consuming, but research shows that time gained by skipping training is lost in the first week of production use because of mis-recognitions. Amounts to $20,000 of wasted worker time in a medium size DC.
Many voice recognizers try to block out background noise with “noise-reducing” microphones. This does not ensure recognition accuracy. Why? DCs have too much fluctuating noise for a “noise-reducing” microphone to handle.
VISE “listens” for background noise and eliminates it which allows the recognizer to process only what the worker said. According to Voxware, VISE has run in some of the loudest operations imaginable (sawmills, airport runways) and VISE still recognizes what users say with near 100% accuracy.
VISE is optimized to recognize phrases as opposed to individual words. Continuous recognition enhances productivity because workers can combine into one response what would ordinarily take two to three interactions using a discrete word recognizer. For example: “Check 457 Grab 6 Put to Alpha”. Other systems would have to break this up into as many as three interactions.
VISE always “knows” what it is listening for. VISE ignores idle chitchat and waits for the expected response. This is called “out of vocabulary rejection”. Recognizers that do not have “out of vocabulary rejection” will interpret everything the user says as input, including overhead paging that is loud enough.
Anyone who knows VoiceXML (used to develop voice applications) could create an application to interact with VISE. Since VISE is open and standards-based, Voxware can use a different VoiceXML recognizer if one is found that could deliver better performance than VISE.
Voxware’s software is hardware independent. Customers are able to port their voice applications to new devices without the need to rewrite any code. Voxware works with hardware manufacturers to help them produce units with the requisite audio performance and quality.