Speech Recognition Technology

SPEECH
RECOGNITION
TECHNOLOGY
Presented by:
Nicole Bralic | Sergio Rumantir | Louis Fong |
Niharika Kohli | Aamir Sheriff

Agenda
1. Origins / history of speech recognition
2. How it works – the technical aspects
3. Issues and concerns
4. Latest trends and future opportunities
5. Activity

ORIGINS / HISTORY
OF SPEECH
RECOGNITION

Introduction
Speech Recognition
What is the first thought that comes to mind?

Origins
When was the first Speech Recognition
Software developed?
a) 1950 b) 1960 c) 1970 d) 1980

Origins
Answer: 1950s
First appearance - could only understand digits

Origins
1960s
Understood 16 words spoken in English

Origins
1970s
Understood 1011 words

Origins
1980s
Understood thousands of words, but still slow

Origins
1990s
First comprehensive software
Cost = $9,000

Origins
2000s
Built into Mac OSX and Windows Vista

Origins
2010s
Apple introduces SIRI

HOW IT WORKS
The technical aspects

Small Vocabulary / Many Users
Types of Users
Large Vocabulary / Few Users

Speech Recognition Models
Today's speech recognition systems use powerful and complicated
statistical modeling systems. These systems use probability and
mathematical functions to determine the most likely outcome.
The two models that dominate the field today are the Hidden Markov
Model and neural networks

The “Hidden Markov” Model
1. Each phoneme is like a link in a
chain, and the completed chain is a
word.
1. The chain branches off in different
directions as the program attempts to
match the digital sound with the
phoneme that's most likely to come
next.
1. The program assigns a probability
score to each phoneme, based on its
built-in dictionary and user training.

There are four basic steps to performing
recognition:
1. Digitize the speech that we want to
recognize.
2. We compute features that represent the
spectral-domain content of the speech.
3. A neural network (also called an ANN, multi-
layer perceptron, or MLP) is used to classify
a set of these features into phonetic-based
categories at each frame.
4. Viterbi search is used to match the neural-
network output scores to the target words,in
order to determine the word that was most
likely uttered.
Neural Network

Issue: Accuracy & Performance
How accurate was the
performance of Siri?
What caused this lack
of accuracy?

Why was the accuracy and performance of Siri
was low in the previous video?
● Background noise
● Overlapped speech
● Speaker’s accent
● Syntactic error
● iPhone 4S and Siri weren’t advanced enough

● Technological improvement
● More vocabulary, lower accuracy
● Perfectly recognize “one” to “nine”, but as library grows,
some words becomes confusing
● Speaker-dependent vs speaker-independent
● Isolated, discontinued, continuous (natural) speech
● Read vs. spontaneous speech
● Cannot understand a sentence that is very off syntactically
● Adverse condition: noise, distortion

Accuracy
enhanced!

Issue: Privacy
Google Chrome  Your PC becomes an open
microphone
Wearable Technology in the workplace  Should
not be used to monitor employees
Facebook Music and TV Recognition  Is it
really turned off?

Issue: Control
As technology advances quickly, is government
legislation good enough to control the proper
usage of speech recognition software?
Is it even possible to control?

LATEST TRENDS
AND FUTURE
OPPORTUNITIES

Computer software technology corporation
Market leader in speech and imaging applications
o server & embedded speech recognition
o telephone call steering systems
o automated telephone directory services
o medical transcription software & systems
o optical character recognition
o desktop imaging
Nuance Communications

• World’s best-selling speech recognition software
• For home, student, power and professional users
• Essential for people with visual impairments
Dragon Naturally Speaking

Using Python to Code by Voice
• The beginning: Tavis Rudd
developed Emacs Pinkie (RSI)
• Months of coding in Python and
Emacs
• Dragon Naturally Speaking voice
recognition software on Microsoft
Windows
• Over 2000 own personal commands
• The code is released for download

Dragon Drive
Nuance integrates its technology with cloud and vehicle on-
board capabilities to create distraction-free driving with
Dragon Drive voice command in action. Over 90 million
cars are currently equipped with Nuance Dragon Drive.

Dragon Medical
Dragon Medical provides clinical
documentation solutions for over 300,000
physicians. This portfolio captures the
physician narrative to document care in the
EHR – anywhere, any time and on any device.

Real-time Skype Translator
Microsoft will release the first beta of real-time
Skype Translator to Windows 8 before the end of
2014.
They are currently implementing near real-time
voice translation of multiple languages in a Skype
call.
Currently there is instant functional translation
from English to German and Chinese.

Future Trends
Voice recognition market to reach
US$2.5 billion in revenue by 2015
Typed Passwords aren’t going to
work in the future

Class Discussion
• What do you think the future of speech
recognition technology will look like?
• What are some other uses of this
technology?
• Do you think the benefits outweigh the
issues?

Speech Recognition Technology

More Related Content

What's hot

Viewers also liked

Similar to Speech Recognition Technology

Recently uploaded

Speech Recognition Technology

Editor's Notes