Speech Recognizers & Generators 
Let’s Get Started… 
Presented by: P. Kahoro 
Presented to: Prof P. Okanda
Speech Recognizers: What are they? 
A Speech is the vocalized form of human communication. 
Incomputer scienceandelectrical engineering,speech recognition(SR) is the translation of spoken words into text. It is also known as "automatic speech recognition" (ASR). Speech Recognition (SR) is the ability to translate a dictation or spoken word to text. 
-Speech recognition has evolved quite a bit over the past few years. Initially, it used to work in discrete dictation mode, where you had to pause between each spoken word. Today, however, it uses continuous dictation. It’s also become smarter, with its own set of grammar rules to make out the meaning of what’s being said.
Terms and Concepts 
•Utterances 
•Pronounciation 
•Grammer 
•Speaker Dependent System 
•Speaker Independent System 
•Training 
•Accuracy
Terms &Concepts 
Utterances: 
An utterance is any stream of speech between two periods of silence. Silence delineates the start and end of an utterance. An utterance can be a single word, or it can contain multiple words (a phrase or a sentence) 
Pronunciations: 
One piece of information that the speech recognition engine uses to process a word is its pronunciation, which represents what the speech engine thinks a word should sound like. 
Words can have multiple pronunciations associated with them. For example, the word “the” has at least two pronunciations in the U.S. English language: “thee” and “thuh”.
Cont… 
Grammar: Grammars define the domain, or context, within which the recognition engine works. The engine compares the current utterance against the words and phrases in the active grammars. If the user says something that is not in the grammar, the speech engine will not be able to understand it correctly. So usually speech engines have a very vast grammar. 
Accuracy: The ability of a recognizer can be examined by measuring its accuracy − or how well it recognizes utterances. 
Training: 
Somespeechrecognizershavetheabilitytoadapttoaspeaker.Whenthesystemhasthisability,itmayallowtrainingtotakeplace.
Cont… 
Speaker Dependent Systems: Speech recognition systems that require a user to train the system to his/her voice are known as speaker-dependent systems. If you are familiar with desktop dictation systems, most are speaker dependent like IBM Via Voice. 
Speaker Independent Systems: 
Speech recognition systems that do not require a user to train the system are known as speaker-independent systems.
How do humans do it? 
Articulation produces sound waves which the ear conveys to the brain for processing
How might computers do it? 
Digitization 
Acoustic analysis of the speech signal 
Linguistic interpretation 
Acoustic waveform 
Acoustic signal 
Speech recognition
How Speech Recognition Work? 
•Audio input 
•Apply a "grammar" so the speech recognizer knows what phonemes to expect. 
•Acoustic Model 
•Recognized text
How do computers do it? 
•First, the user gives a voice command over the microphone, which is passed to the sound card in your system. This analog signal is sampled converted into digital form using a technique called Pulse Code Modulation or PCM. This digital waveform is a stream of amplitudes that look like a wavy line. 
•The audio signal is further sampled and each sample is converted into a frequency domain. So, the incoming stream is now a set of discrete frequency bands, in a form that can be used by the speech recognizer. 
•The next stage involves recognizing these bands of frequencies. For this, the speech recognition software has a database containing thousands of frequencies or "phonemes", as they’re called.
Hardware: 
Sound Cards 
Soundcard with the cleanest A/D (Analog to Digital) conversions are recommended. 
Microphone 
The best choice for microphone is the headset style. 
Computers / Processors 
The more the speed the better Speech Recognition would work. For good Speech Recognition you should be having 1 GHz processor and 1 GB of RAM.
Where can it be used? 
•GPS: System control/navigation e.g. GPS-connected digital maps: “How far is it to the motorway junction?” 
•Commercial/Industrial applicationsin-car steering systems 
•Mobile telephony: Voice dialing hands-free use of mobile in car e.g. “Dial office” 
•Home automation -heating, ventilation and air conditioning
Where can it be used? 
•Military: System control/navigation e.g. Military -High-performance fighter aircraft, Helicopters, Training Air Traffic Controllers 
•Computer and Video Games: Speech input has been used in a limited number of computer and video games. The Microsoft Xbox, Nintendo GameCube, and Sony PlayStation 2 consoles all offer games with speech input/output. 
•Usage in education -Students who are blind 
•Voice Security System: security locks of gates and doors 
•Wearable Computers: The most futuristic application is in the use and functionality of wearable computers.
Speech Recognition Software 
•Dragon Naturally Speeking 
•IBM Via Voice 
•Microsoft Speech Recognition System 
•MacSpeechDictate 
•Philips Speech Magic
Pros of Speech Recognition 
•Faster than “hand-writing”. 
•Allows for better spelling, whether it be in text or documents. 
•Helpful for people with a mental or physical disability . 
•Hands-free capability .
Cons of Speech Recognition 
•No program is 100% perfect 
•Factors that affect the accuracy of speech recognition are: slang, homonyms, signal-to-noise ratio, and overlapping speech 
•Can be expensive depending on the program 
•Easily misinterprets vocal commands e.gSIRI
Conclusion 
•Revolutionize the way people conduct business over the Web and ,differentiate world-class e-businesses. 
•VoiceXMLties speech recognition and telephony together 
•voice-enabled Web solutions TODAY!
Generators: 
•Software generators are programs that build other programs. In computer science, a generator is a special routine that can be used to control the iteration behavior of a loop. In fact, all generators are iterators. 
•A generator is very similar to a function that returns an array, in that a generator has parameters, can be called, and generates a sequence of values. However, instead of building an array containing all the values and returning them all at once, a generator yields the values one at a time, which requires less memory and allows the caller to get started processing the first few values immediately. In short, a generator looks like a function but behaves like an iterator.
Types of software generators: 
•key generator(key-gen) 
•RandomPassword Generators 
•Code generator 
•Natural language generator 
•Random test generator 
•Pseudorandom number generator
key generator(key-gen) 
•Akeygenerator(key-gen)isacomputerprogramthatgeneratesaproductlicensingkey,suchasaserialnumber,necessarytoactivateforuseasoftwareapplication. 
•Key-gensmaybelegitimatelydistributedbysoftwaremanufacturersforlicensingsoftwareincommercialenvironmentswheresoftwarehasbeenlicensedinbulkforanentiresiteorenterprise,ortheymaybedistributedillegitimatelyincircumstancesofcopyrightinfringementorsoftwarepiracy. 
•Asoftwarelicenseisalegalinstrumentthatgovernstheusageanddistributionofcomputersoftware. 
•Illegitimatekeygeneratorsaretypicallydistributedbysoftwarecrackerse.gkey-gensusedtocrackfakeWindowsOSe.gWindows8arealreadyavailable
Random password generator 
•Arandompasswordgeneratorissoftwareprogramorhardwaredevicethattakesinputfromarandomorpseudo- randomnumbergeneratorandautomaticallygeneratesapassword.Randompasswordscanbegeneratedmanually,usingsimplesourcesofrandomnesssuchasdiceorcoins,ortheycanbegeneratedusingacomputer. 
•Whiletherearemanyexamplesof"random"passwordgeneratorprogramsavailableontheInternet,generatingrandomnesscanbetrickyandmanyprogramsdonotgeneraterandomcharactersinawaythatensuresstrongsecurity.Acommonrecommendationistouseopensourcesecuritytoolswherepossible,sincetheyallowindependentchecksonthequalityofthemethodsused.Notethatsimplygeneratingapasswordatrandomdoesnotensurethepasswordisastrongpassword,becauseitispossible,althoughhighlyunlikely,togenerateaneasilyguessedorcrackedpassword.Infactthereisnoneedatallforapasswordtohavebeenproducedbyaperfectlyrandomprocess:itjustneedstobesufficientlydifficulttoguess.
Pseudorandom number generators 
•Apseudorandom number generator(PRNG), also known as adeterministic random bit generator(DRBG),is analgorithmfor generating a sequence of numbers whose properties approximate the properties of sequences ofrandom numbers. 
•Although sequences that are closer to truly random can be generated using hardware random number generators, pseudorandom number generators are important in practice for their speed in number generation and their reproducibility.
Code generator 
•In computing, code generation is the process by which a compiler's code generator converts some intermediate representation of source code into a form (e.g., machine code) that can be readily executed by a machine. 
•Sophisticated compilers typically perform multiple passes over various intermediate forms. This multi-stage process is used because many algorithms for code optimization are easier to apply one at a time, or because the input to one optimization relies on the completed processing performed by another optimization.
Men have become the tools of their tools. -P. Kahoro 
The End

Speech recognizers & generators

  • 1.
    Speech Recognizers &Generators Let’s Get Started… Presented by: P. Kahoro Presented to: Prof P. Okanda
  • 2.
    Speech Recognizers: Whatare they? A Speech is the vocalized form of human communication. Incomputer scienceandelectrical engineering,speech recognition(SR) is the translation of spoken words into text. It is also known as "automatic speech recognition" (ASR). Speech Recognition (SR) is the ability to translate a dictation or spoken word to text. -Speech recognition has evolved quite a bit over the past few years. Initially, it used to work in discrete dictation mode, where you had to pause between each spoken word. Today, however, it uses continuous dictation. It’s also become smarter, with its own set of grammar rules to make out the meaning of what’s being said.
  • 3.
    Terms and Concepts •Utterances •Pronounciation •Grammer •Speaker Dependent System •Speaker Independent System •Training •Accuracy
  • 4.
    Terms &Concepts Utterances: An utterance is any stream of speech between two periods of silence. Silence delineates the start and end of an utterance. An utterance can be a single word, or it can contain multiple words (a phrase or a sentence) Pronunciations: One piece of information that the speech recognition engine uses to process a word is its pronunciation, which represents what the speech engine thinks a word should sound like. Words can have multiple pronunciations associated with them. For example, the word “the” has at least two pronunciations in the U.S. English language: “thee” and “thuh”.
  • 5.
    Cont… Grammar: Grammarsdefine the domain, or context, within which the recognition engine works. The engine compares the current utterance against the words and phrases in the active grammars. If the user says something that is not in the grammar, the speech engine will not be able to understand it correctly. So usually speech engines have a very vast grammar. Accuracy: The ability of a recognizer can be examined by measuring its accuracy − or how well it recognizes utterances. Training: Somespeechrecognizershavetheabilitytoadapttoaspeaker.Whenthesystemhasthisability,itmayallowtrainingtotakeplace.
  • 6.
    Cont… Speaker DependentSystems: Speech recognition systems that require a user to train the system to his/her voice are known as speaker-dependent systems. If you are familiar with desktop dictation systems, most are speaker dependent like IBM Via Voice. Speaker Independent Systems: Speech recognition systems that do not require a user to train the system are known as speaker-independent systems.
  • 7.
    How do humansdo it? Articulation produces sound waves which the ear conveys to the brain for processing
  • 8.
    How might computersdo it? Digitization Acoustic analysis of the speech signal Linguistic interpretation Acoustic waveform Acoustic signal Speech recognition
  • 9.
    How Speech RecognitionWork? •Audio input •Apply a "grammar" so the speech recognizer knows what phonemes to expect. •Acoustic Model •Recognized text
  • 10.
    How do computersdo it? •First, the user gives a voice command over the microphone, which is passed to the sound card in your system. This analog signal is sampled converted into digital form using a technique called Pulse Code Modulation or PCM. This digital waveform is a stream of amplitudes that look like a wavy line. •The audio signal is further sampled and each sample is converted into a frequency domain. So, the incoming stream is now a set of discrete frequency bands, in a form that can be used by the speech recognizer. •The next stage involves recognizing these bands of frequencies. For this, the speech recognition software has a database containing thousands of frequencies or "phonemes", as they’re called.
  • 11.
    Hardware: Sound Cards Soundcard with the cleanest A/D (Analog to Digital) conversions are recommended. Microphone The best choice for microphone is the headset style. Computers / Processors The more the speed the better Speech Recognition would work. For good Speech Recognition you should be having 1 GHz processor and 1 GB of RAM.
  • 12.
    Where can itbe used? •GPS: System control/navigation e.g. GPS-connected digital maps: “How far is it to the motorway junction?” •Commercial/Industrial applicationsin-car steering systems •Mobile telephony: Voice dialing hands-free use of mobile in car e.g. “Dial office” •Home automation -heating, ventilation and air conditioning
  • 13.
    Where can itbe used? •Military: System control/navigation e.g. Military -High-performance fighter aircraft, Helicopters, Training Air Traffic Controllers •Computer and Video Games: Speech input has been used in a limited number of computer and video games. The Microsoft Xbox, Nintendo GameCube, and Sony PlayStation 2 consoles all offer games with speech input/output. •Usage in education -Students who are blind •Voice Security System: security locks of gates and doors •Wearable Computers: The most futuristic application is in the use and functionality of wearable computers.
  • 14.
    Speech Recognition Software •Dragon Naturally Speeking •IBM Via Voice •Microsoft Speech Recognition System •MacSpeechDictate •Philips Speech Magic
  • 15.
    Pros of SpeechRecognition •Faster than “hand-writing”. •Allows for better spelling, whether it be in text or documents. •Helpful for people with a mental or physical disability . •Hands-free capability .
  • 16.
    Cons of SpeechRecognition •No program is 100% perfect •Factors that affect the accuracy of speech recognition are: slang, homonyms, signal-to-noise ratio, and overlapping speech •Can be expensive depending on the program •Easily misinterprets vocal commands e.gSIRI
  • 17.
    Conclusion •Revolutionize theway people conduct business over the Web and ,differentiate world-class e-businesses. •VoiceXMLties speech recognition and telephony together •voice-enabled Web solutions TODAY!
  • 18.
    Generators: •Software generatorsare programs that build other programs. In computer science, a generator is a special routine that can be used to control the iteration behavior of a loop. In fact, all generators are iterators. •A generator is very similar to a function that returns an array, in that a generator has parameters, can be called, and generates a sequence of values. However, instead of building an array containing all the values and returning them all at once, a generator yields the values one at a time, which requires less memory and allows the caller to get started processing the first few values immediately. In short, a generator looks like a function but behaves like an iterator.
  • 19.
    Types of softwaregenerators: •key generator(key-gen) •RandomPassword Generators •Code generator •Natural language generator •Random test generator •Pseudorandom number generator
  • 20.
    key generator(key-gen) •Akeygenerator(key-gen)isacomputerprogramthatgeneratesaproductlicensingkey,suchasaserialnumber,necessarytoactivateforuseasoftwareapplication. •Key-gensmaybelegitimatelydistributedbysoftwaremanufacturersforlicensingsoftwareincommercialenvironmentswheresoftwarehasbeenlicensedinbulkforanentiresiteorenterprise,ortheymaybedistributedillegitimatelyincircumstancesofcopyrightinfringementorsoftwarepiracy. •Asoftwarelicenseisalegalinstrumentthatgovernstheusageanddistributionofcomputersoftware. •Illegitimatekeygeneratorsaretypicallydistributedbysoftwarecrackerse.gkey-gensusedtocrackfakeWindowsOSe.gWindows8arealreadyavailable
  • 21.
    Random password generator •Arandompasswordgeneratorissoftwareprogramorhardwaredevicethattakesinputfromarandomorpseudo- randomnumbergeneratorandautomaticallygeneratesapassword.Randompasswordscanbegeneratedmanually,usingsimplesourcesofrandomnesssuchasdiceorcoins,ortheycanbegeneratedusingacomputer. •Whiletherearemanyexamplesof"random"passwordgeneratorprogramsavailableontheInternet,generatingrandomnesscanbetrickyandmanyprogramsdonotgeneraterandomcharactersinawaythatensuresstrongsecurity.Acommonrecommendationistouseopensourcesecuritytoolswherepossible,sincetheyallowindependentchecksonthequalityofthemethodsused.Notethatsimplygeneratingapasswordatrandomdoesnotensurethepasswordisastrongpassword,becauseitispossible,althoughhighlyunlikely,togenerateaneasilyguessedorcrackedpassword.Infactthereisnoneedatallforapasswordtohavebeenproducedbyaperfectlyrandomprocess:itjustneedstobesufficientlydifficulttoguess.
  • 22.
    Pseudorandom number generators •Apseudorandom number generator(PRNG), also known as adeterministic random bit generator(DRBG),is analgorithmfor generating a sequence of numbers whose properties approximate the properties of sequences ofrandom numbers. •Although sequences that are closer to truly random can be generated using hardware random number generators, pseudorandom number generators are important in practice for their speed in number generation and their reproducibility.
  • 23.
    Code generator •Incomputing, code generation is the process by which a compiler's code generator converts some intermediate representation of source code into a form (e.g., machine code) that can be readily executed by a machine. •Sophisticated compilers typically perform multiple passes over various intermediate forms. This multi-stage process is used because many algorithms for code optimization are easier to apply one at a time, or because the input to one optimization relies on the completed processing performed by another optimization.
  • 24.
    Men have becomethe tools of their tools. -P. Kahoro The End