- Avgvocab of a 3 year old-Beam search is an optimization of best-first search that reduces its memory requirements
Transcript of "Speech Recognition"
Matt Oliverio, Shane Conceicao, and Sean Hanley
Bell Laboratories designed in 1952 the "Audrey" system, which recognized digits spoken by a single voice. IBM demonstrated at the 1962 Worlds Fair its “Shoebox" machine, which could understand 16 words spoken in English and solve arithmetic on voice command
U.S. Department of Defense’s DARPA Speech Understanding Research (SUR) program in the 1970’s was responsible for Carnegie Melon’s Harpy system. Harpy could understand 1011 words Harpy was significant because it was the first to use beam search technology A predetermined number of best partial solutions are kept as candidates and it predicts how close it is to complete solution
In the 1980’s speech recognition vocabulary jumped dramatically due to a new statistical method called the Hidden Markov Model Instead of using templates for words and looking for sound patterns, HMM took the probability of unknown sounds being words This gave the potential for speech recognition programs recognize an unlimited number of words
Introduced in 1987 children could train the doll to respond to their voice http://www.youtube.com/watch?feature=play er_embedded&v=UkU9SbIictc
In the 1990’s faster computers made it possible for ordinary people to have speech recognition software In 1990 Dragon Dictate came out for $9000 Seven years later Dragon Naturally Speaking arrived for $695 Could understand words at a natural speed but you had to train it for 45 minutes
By 2001, computer speech recognition had topped out at 80 percent accuracy and progress seemed to stall until the end of the decade Google’s voice search app for the iPhone and Apple’s Siri brought speech recognition back to the forefront
Interact with the calendar. Search contacts. Read and write messages (text and email). Interact with the Maps app and location services. Utilize search providers Can understand English (US, UK, Australia), French, German, and Japanese
Mobile App that allows the user to speak one language into the phone and produces a verbal translation iPhone and Android Thai, Chinese, French, German, Iraqi, Japanese, Korean, Spanish, TagologEnglish German-Spanish
Ford SYNC technology ◦ Music ◦ Directions ◦ Handsfree Calling ◦ http://www.youtube.com/watch?v=My IgbcdOliw Nuance Dragon NaturallySpeaking ◦ Audi, BMW, Fiat, Hyundai, Mercedes, Jaguar, Porsche, Volkswagen ◦ “One-Shot Destination Entry” and full control of the “infotainment system”
Microphone on Kinect Start by saying “Xbox,” and then saying one of the commands on screen Understands English, French, German, Italian, Spanish, Japanese Minimal background noise, clarity important
Medical ◦ Allow doctors to talk into patient’s file to record notes during examinations Court of Law ◦ Record and digitize court proceedings in real time ◦ Reduce time and cost, increase efficiency Educational ◦ Rapid text-to-speech, aiding kids with disabilities ◦ http://tinyurl.com/5wtl8wv
Speeds up “writing” Improvements in spelling Beneficial for the handicapped Physically or Mentally Ability to multitask Frees up physical limitations of using one’s hands
Do you feel that the pros outweigh the cons? Is it worth investing in this software despite current limitations? Does anyone have Siri? Does it actually help you? Would anyone prefer to use the speech-to- text software to write papers?