Field: Computational linguistics 
Topic :Speech Recognition 
how SR came to be and how it 
works? How beneficial is it for 
students? 
Presenter: 
AMADOU ADAMOU AMINOU
Have you ever talked to your 
computer or your phone? Where 
it actually recognized what you 
said and than did something as a 
result??
Contents 
• History 
• Introduction 
• Types of Speech Recognition 
• Components of SR 
• Applications 
• Examples of SR 
• Weakness and flaws 
• Conclusion 
• References
INTRODUCTION 
WHAT IS SPEECH RECOGNISTION? 
• Speech recognition is a process by which a 
computer takes a speech signal (recorded 
using a microphone) and converts it into 
words in real-time. 
• SR simply is the process of converting spoken 
input to text. 
PELLOM B. Sonic: The University of Colorado Continuous, Speech 
Recognition System.2001
History 
• The first speech recognition system was invented by Bell 
laboratories in 1952 in US. It could understand only digits 
spoken by a single voice. The system is called the “AUDREY” 
system. 
• Ten years late labs in the US, Japan, England developed 
hardware dedicated to recognize spoken sounds. 
• They now have large vocabularies and can recognize 
continuous speech. 
Eugene Weinstein - Speech Recognition p15, Prentice hall 1995
Why was SRS invented 
• Individuals With Disabilities – Assists those who have visual 
impairment, hand immobility, dyslexia, etc. 
• Medical Transcription – Reduces delays to write out 
medical transcriptions 
• Dictation - Converts words to text in emails or other word 
documents (also helpful for English Language Learners). 
• Access Menu Commands – Opens files using voice commands. 
Eugene Weinstein - Speech Recognition p40-50, Prentice hall 1995
Speech recognition system consists of: 
• A microphone. 
• A speech recognition software. 
• A computer to take and interpret the speech. 
• A good quality soundcard for input and output. 
• A proper and good pronunciation. 
BGROUV, B T. (1989) computational linguistics, London, Longman
Two types of SR.. 
• Speaker-dependent systems 
– Require “training” to “teach” the individual System 
– More robust 
– But less convenient 
– And obviously less portable 
• Speaker-independent systems 
– Language coverage is reduced to compensate need to be 
flexible in phoneme identification 
– Clever compromise is to learn on the fly 
Eugene Weinstein - Speech Recognition p40-45, Prentice hall 1995
Components 
• Audio input 
• Grammar 
• Speech Recognition Engine 
• Acoustic Model 
• Recognized text
What’s hard about that? 
• Digitization 
– Converting analogue signal into digital representation. 
• Signal processing 
– Separating speech from background noise. 
• Phonetics 
– Variability in human speech. 
• Phonology 
– Recognizing individual sound distinctions (similar phonemes.) 
• Lexicology and syntax 
– Disambiguating homophones. 
– Features of continuous speech. 
• Syntax and pragmatics 
– Interpreting features. 
– Filtering of performance errors (disfluencies). 
Eugene Weinstein - Speech Recognition p15-67, Prentice hall 1995.
Potential uses in education 
• Teaching students of foreign languages to 
pronounce vocabulary correctly 
• Enabling students who are physically 
handicapped who cant use keyboard. 
• Enabling student with textual interpretive 
problems e.g. Dyslexia to enter text verbally. 
• Restrictive access on high security computer, 
where a keyboard may be used by hackers. 
BGROUV, B T. (1989) computational linguistics, London, Longman
Applications of Speech Recognition 
• Speech recognition applications include 
 controlling any devices in your home (e.g. video), 
 Call routing (e.g., "I would like to make a collect call"), 
 Simple data entry (e.g., entering a credit card number), 
 Preparation of structured documents (e.g., A radiology 
report), 
 Speech-to-text processing (e.g., word processors or emails) 
 In aircraft cockpits (usually termed Direct Voice Input). 
BGROUV, B T. (1989) computational linguistics, London: LONMAN
Example: Microsoft Speech 
Recognition – Windows 7
SIRI & GOOGLE 
Intelligent Personal Assistant 
developed by Apple. 
Google Now is an intelligent personal 
assistant developed by Google. 
Both use a combination of speaker- dependent and speaker-independent 
SR systems
Weakness and Flaws 
• Low signal-to-noise ratio: the program needs to 
hear the words spoken distinctly. 
• Intensive use of device power. 
• Homonyms e.g. “there” and “their”, “be” and 
“bee” 
• Overlapping speech. 
• No program is 100% perfect 
• Problem of understanding dialects and accents 
Example: see video 
A practical Intro to the computer Analysis of Language, Geoff Barnbrook, 
1996, Edinburgh University press.
Conclusion 
• Revolutionize the way people conduct 
business over the Web and ,differentiate 
world-class e-businesses. 
• VoiceXML ties speech recognition and 
telephony together 
• At some point in the future, speech 
recognition may become speech 
understanding. 
• voice-enabled Web solutions TODAY!
References 
• PELLOM, B., Sonic: The University of Colorado 
Continuous Speech Recognition System, 2001 
• BOGUREV, B T. (1989) computational linguistics, 
London: LONGMAN 
• Eugene Weinstein - Speech Recognition, Prentice 
hall 1995 
• http://www.tldp.org/HOWTO/Speech- 
Recognition- 
• A practical Intro to the computer Analysis of 
Language, Geoff BRANKROOK, 1996, Edinburgh 
University press.

Amadou

  • 1.
    Field: Computational linguistics Topic :Speech Recognition how SR came to be and how it works? How beneficial is it for students? Presenter: AMADOU ADAMOU AMINOU
  • 2.
    Have you evertalked to your computer or your phone? Where it actually recognized what you said and than did something as a result??
  • 3.
    Contents • History • Introduction • Types of Speech Recognition • Components of SR • Applications • Examples of SR • Weakness and flaws • Conclusion • References
  • 4.
    INTRODUCTION WHAT ISSPEECH RECOGNISTION? • Speech recognition is a process by which a computer takes a speech signal (recorded using a microphone) and converts it into words in real-time. • SR simply is the process of converting spoken input to text. PELLOM B. Sonic: The University of Colorado Continuous, Speech Recognition System.2001
  • 5.
    History • Thefirst speech recognition system was invented by Bell laboratories in 1952 in US. It could understand only digits spoken by a single voice. The system is called the “AUDREY” system. • Ten years late labs in the US, Japan, England developed hardware dedicated to recognize spoken sounds. • They now have large vocabularies and can recognize continuous speech. Eugene Weinstein - Speech Recognition p15, Prentice hall 1995
  • 6.
    Why was SRSinvented • Individuals With Disabilities – Assists those who have visual impairment, hand immobility, dyslexia, etc. • Medical Transcription – Reduces delays to write out medical transcriptions • Dictation - Converts words to text in emails or other word documents (also helpful for English Language Learners). • Access Menu Commands – Opens files using voice commands. Eugene Weinstein - Speech Recognition p40-50, Prentice hall 1995
  • 7.
    Speech recognition systemconsists of: • A microphone. • A speech recognition software. • A computer to take and interpret the speech. • A good quality soundcard for input and output. • A proper and good pronunciation. BGROUV, B T. (1989) computational linguistics, London, Longman
  • 8.
    Two types ofSR.. • Speaker-dependent systems – Require “training” to “teach” the individual System – More robust – But less convenient – And obviously less portable • Speaker-independent systems – Language coverage is reduced to compensate need to be flexible in phoneme identification – Clever compromise is to learn on the fly Eugene Weinstein - Speech Recognition p40-45, Prentice hall 1995
  • 9.
    Components • Audioinput • Grammar • Speech Recognition Engine • Acoustic Model • Recognized text
  • 10.
    What’s hard aboutthat? • Digitization – Converting analogue signal into digital representation. • Signal processing – Separating speech from background noise. • Phonetics – Variability in human speech. • Phonology – Recognizing individual sound distinctions (similar phonemes.) • Lexicology and syntax – Disambiguating homophones. – Features of continuous speech. • Syntax and pragmatics – Interpreting features. – Filtering of performance errors (disfluencies). Eugene Weinstein - Speech Recognition p15-67, Prentice hall 1995.
  • 11.
    Potential uses ineducation • Teaching students of foreign languages to pronounce vocabulary correctly • Enabling students who are physically handicapped who cant use keyboard. • Enabling student with textual interpretive problems e.g. Dyslexia to enter text verbally. • Restrictive access on high security computer, where a keyboard may be used by hackers. BGROUV, B T. (1989) computational linguistics, London, Longman
  • 12.
    Applications of SpeechRecognition • Speech recognition applications include  controlling any devices in your home (e.g. video),  Call routing (e.g., "I would like to make a collect call"),  Simple data entry (e.g., entering a credit card number),  Preparation of structured documents (e.g., A radiology report),  Speech-to-text processing (e.g., word processors or emails)  In aircraft cockpits (usually termed Direct Voice Input). BGROUV, B T. (1989) computational linguistics, London: LONMAN
  • 13.
    Example: Microsoft Speech Recognition – Windows 7
  • 14.
    SIRI & GOOGLE Intelligent Personal Assistant developed by Apple. Google Now is an intelligent personal assistant developed by Google. Both use a combination of speaker- dependent and speaker-independent SR systems
  • 15.
    Weakness and Flaws • Low signal-to-noise ratio: the program needs to hear the words spoken distinctly. • Intensive use of device power. • Homonyms e.g. “there” and “their”, “be” and “bee” • Overlapping speech. • No program is 100% perfect • Problem of understanding dialects and accents Example: see video A practical Intro to the computer Analysis of Language, Geoff Barnbrook, 1996, Edinburgh University press.
  • 16.
    Conclusion • Revolutionizethe way people conduct business over the Web and ,differentiate world-class e-businesses. • VoiceXML ties speech recognition and telephony together • At some point in the future, speech recognition may become speech understanding. • voice-enabled Web solutions TODAY!
  • 17.
    References • PELLOM,B., Sonic: The University of Colorado Continuous Speech Recognition System, 2001 • BOGUREV, B T. (1989) computational linguistics, London: LONGMAN • Eugene Weinstein - Speech Recognition, Prentice hall 1995 • http://www.tldp.org/HOWTO/Speech- Recognition- • A practical Intro to the computer Analysis of Language, Geoff BRANKROOK, 1996, Edinburgh University press.