SUBMITTED TO-MR. 
ABHISHEK SRIVASTAVA 
SUBMITTED BY-VIKALP 
MAHENDRA 
(EC-11)
 Introduction 
 Block Diagram 
 Linguistic Levels Of Analysis 
 Phonetics 
 Organs Of Speech And Articulation 
 Acoustic Model 
 Circuit Diagram 
 Components Used 
 Features Of HM2007 
 Working 
 Extracting Phonemes In Frequency Domain 
 Markov Model 
 Advantages 
 Applications 
 Conclusion
 Analyses sound and converts spoken word into text 
Uses knowledge of spoken English 
 Programs are available for voice recognition. 
 Systems work best on Windows XP & Windows Vista
Computers 
Databases Algorithms 
Robotics Natural Language Processing Search 
Information 
Retrieval 
Machine 
Translation 
Language 
Analysis 
Semantics
 Speech 
 Written language 
 Phonology: sounds / letters / pronunciation 
 Morphology: the structure of words 
 Syntax: how these sequences are structured 
 Semantics: meaning of the strings
The Study of the way Humans make, Transmit, and 
receive sounds 
Phonology - the study of sound systems of languages 
 A typical word such as moon broken down into 
three phonemes: m, ue , n. 
Phoneme represents all vowels and consonants of 
spoken speech
 Most vowel sounds are modified by the shape of 
the lips (rounded / spread / neutral) 
 Sounds are made by vibrating the vocal cords 
(voicing) 
 Vowels can be :- 
 Single sounds – Monophthongs or pure vowels 
 Double sounds - Diphthongs 
 Triple sounds - Triphthongs 
 Pure vowels usually come in pairs consisting of 
long and short sounds
This is found in the word tea. The lips are spread and the sound is long. 
This is found in the word hip. The lips are slightly spread and the sound is short. 
The tongue tip is raised slightly at the front towards the alveolar. In the longer sound the 
tongue is raised higher.
 This sound is made by relaxing the mouth and 
keeping your lips in a neutral position and making 
a short sound. It is found in words like paper, 
over, about, and common in weak verbs in spoken 
English.
The long sound – you, too & blue 
The short sound –Good, would & 
wool 
The lips are rounded and the centre 
and back of the tongue is raised towards 
the soft plate. For the longer sound the 
tongue is raised higher and the lips are 
more rounded. 
This sound is made with the mouth 
spread wide open. It is found in – cat, 
man, apple & ran
 Here we have three sounds: The sounds from - 
1) for 2) tour 3) go 
 Triphthongs are combinations of three sounds- 
English has 1 triphthong (a diphthong + a 
schwa sound) 
 Diphthongs are combinations of two sounds.
Diphthongs are combinations of pure vowels. 
•a:+ I = ‘aI’ - tie, buy, height & night 
•e + I = ‘eI’ -way, paid & gate 
•o: + I = ‘oI’ – boy, coin & coy 
•e + = e - where, hair & care 
• I + = I - here, hear & beer 
e e 
e e
 The audio recording of speech to create a 
statistical representation of sound. 
 To create a speech recognition engine, a large 
database of models is created to match each 
phoneme 
 These database models have stored 
phonemes 
 The language model has the grammar of the 
sentence to decode our spoken word to text.
 HM 2007 IC 
 SRAM 8K*8 
 LATCH 74LS373 
 INPUT BUFFER 7448 
 XTAL 3.57MHz 
 PCB 
 KEYPAD 
 PC MOUNTED SWITCHES 
 7 SEGMENT DISPLAY 
 MICROPHONE 
 22K RESISTOR 
 100K RESISTOR 
 .0047F CAPACITOR
 A single chip voice recognition system 
having 48 pin . 
 Manufactured by Hualon 
 Maximum 40 word and word length 1.92 sec 
 Microphone support 
 5V power supply
 How a computer convert spoken speech into data ?? 
 When we speak, a microphone converts the analog signal of our voice into 
digital chunks of data that the computer analyzes. 
 It is from this data that the computer extracts enough information that 
confidently guess the word being spoken
 To extract phonemes 
 Phonemes are linguistic units 
 The sounds that group together form words 
 Phoneme converts into sound & depends on many factors
 aa - father 
 ae - cat 
 ah - cut 
 ao - dog 
 aw - foul 
 ng - sing 
 t - talk 
 th - thin 
 uh - book 
waveform shows 
phonemes freq 
characteristics
 Phonemes are extracted by running waveform through Fourier 
transform 
 Easily visible in frequency domain 
 This can be make out by seeing spectrograph 
 Spectrograph is a 3-D plot of waveform freq and amplitude 
versus time and amplitude is shown in grey colour
 Computer generates list of phoneme 
 These phoneme have to be converted into words and to 
sentence so Markov model is used 
 It compares the observed phoneme with the stored phoneme
 In this, word tomato is written both in English and American 
English format 
 This idea is used upto the level of sentences and improved 
recognition
 It is used to translate different form of language 
 It Is used in telephones 
 The std land line telephone has a bandwidth of 64kb/s. 
 Sampling rate of 8khz 
 In Std desktop P.C ,the limiting factor is sound card.It can 
record sampling rate between 16 kHz to 48 kHz
 MILITARY 
 HELICOPTERS 
 IN MOBILE SMARTPHONES` 
 SPEECH CONTROLLED 
APPLIANCES 
 VOICE RECOGNITION SECURITY
 Speech recognition system is one of the latest technology . 
 Ir reduces costs like that of training 
 Steps : 
 Fourier transform of signal 
 Extraction of Phonemes 
 Formation of word on the basis of Markov Models 
 Charm of Simplicity 
 With the advent of this technology, we will hopefully see a 
new era of human computer interaction .
 From: Chapter 1 of An Introduction to Natural Language 
Processing, Computational Linguistics, and Speech 
Recognition, by Daniel Jurafsky and James H. Martin 
 http://en.wikipedia.org/wiki/acoustic model 
 http://en.wikipedia.org/wiki/speech recognition 
 www.wikpedia.org 
 www.slideshare.net 
 Natural Language Processing by Rada Mihalcea 
 www.youtube.com
Speech and Language Processing

Speech and Language Processing

  • 1.
    SUBMITTED TO-MR. ABHISHEKSRIVASTAVA SUBMITTED BY-VIKALP MAHENDRA (EC-11)
  • 2.
     Introduction Block Diagram  Linguistic Levels Of Analysis  Phonetics  Organs Of Speech And Articulation  Acoustic Model  Circuit Diagram  Components Used  Features Of HM2007  Working  Extracting Phonemes In Frequency Domain  Markov Model  Advantages  Applications  Conclusion
  • 3.
     Analyses soundand converts spoken word into text Uses knowledge of spoken English  Programs are available for voice recognition.  Systems work best on Windows XP & Windows Vista
  • 4.
    Computers Databases Algorithms Robotics Natural Language Processing Search Information Retrieval Machine Translation Language Analysis Semantics
  • 5.
     Speech Written language  Phonology: sounds / letters / pronunciation  Morphology: the structure of words  Syntax: how these sequences are structured  Semantics: meaning of the strings
  • 6.
    The Study ofthe way Humans make, Transmit, and receive sounds Phonology - the study of sound systems of languages  A typical word such as moon broken down into three phonemes: m, ue , n. Phoneme represents all vowels and consonants of spoken speech
  • 8.
     Most vowelsounds are modified by the shape of the lips (rounded / spread / neutral)  Sounds are made by vibrating the vocal cords (voicing)  Vowels can be :-  Single sounds – Monophthongs or pure vowels  Double sounds - Diphthongs  Triple sounds - Triphthongs  Pure vowels usually come in pairs consisting of long and short sounds
  • 9.
    This is foundin the word tea. The lips are spread and the sound is long. This is found in the word hip. The lips are slightly spread and the sound is short. The tongue tip is raised slightly at the front towards the alveolar. In the longer sound the tongue is raised higher.
  • 10.
     This soundis made by relaxing the mouth and keeping your lips in a neutral position and making a short sound. It is found in words like paper, over, about, and common in weak verbs in spoken English.
  • 11.
    The long sound– you, too & blue The short sound –Good, would & wool The lips are rounded and the centre and back of the tongue is raised towards the soft plate. For the longer sound the tongue is raised higher and the lips are more rounded. This sound is made with the mouth spread wide open. It is found in – cat, man, apple & ran
  • 12.
     Here wehave three sounds: The sounds from - 1) for 2) tour 3) go  Triphthongs are combinations of three sounds- English has 1 triphthong (a diphthong + a schwa sound)  Diphthongs are combinations of two sounds.
  • 13.
    Diphthongs are combinationsof pure vowels. •a:+ I = ‘aI’ - tie, buy, height & night •e + I = ‘eI’ -way, paid & gate •o: + I = ‘oI’ – boy, coin & coy •e + = e - where, hair & care • I + = I - here, hear & beer e e e e
  • 14.
     The audiorecording of speech to create a statistical representation of sound.  To create a speech recognition engine, a large database of models is created to match each phoneme  These database models have stored phonemes  The language model has the grammar of the sentence to decode our spoken word to text.
  • 16.
     HM 2007IC  SRAM 8K*8  LATCH 74LS373  INPUT BUFFER 7448  XTAL 3.57MHz  PCB  KEYPAD  PC MOUNTED SWITCHES  7 SEGMENT DISPLAY  MICROPHONE  22K RESISTOR  100K RESISTOR  .0047F CAPACITOR
  • 17.
     A singlechip voice recognition system having 48 pin .  Manufactured by Hualon  Maximum 40 word and word length 1.92 sec  Microphone support  5V power supply
  • 19.
     How acomputer convert spoken speech into data ??  When we speak, a microphone converts the analog signal of our voice into digital chunks of data that the computer analyzes.  It is from this data that the computer extracts enough information that confidently guess the word being spoken
  • 20.
     To extractphonemes  Phonemes are linguistic units  The sounds that group together form words  Phoneme converts into sound & depends on many factors
  • 21.
     aa -father  ae - cat  ah - cut  ao - dog  aw - foul  ng - sing  t - talk  th - thin  uh - book waveform shows phonemes freq characteristics
  • 22.
     Phonemes areextracted by running waveform through Fourier transform  Easily visible in frequency domain  This can be make out by seeing spectrograph  Spectrograph is a 3-D plot of waveform freq and amplitude versus time and amplitude is shown in grey colour
  • 23.
     Computer generateslist of phoneme  These phoneme have to be converted into words and to sentence so Markov model is used  It compares the observed phoneme with the stored phoneme
  • 24.
     In this,word tomato is written both in English and American English format  This idea is used upto the level of sentences and improved recognition
  • 25.
     It isused to translate different form of language  It Is used in telephones  The std land line telephone has a bandwidth of 64kb/s.  Sampling rate of 8khz  In Std desktop P.C ,the limiting factor is sound card.It can record sampling rate between 16 kHz to 48 kHz
  • 26.
     MILITARY HELICOPTERS  IN MOBILE SMARTPHONES`  SPEECH CONTROLLED APPLIANCES  VOICE RECOGNITION SECURITY
  • 27.
     Speech recognitionsystem is one of the latest technology .  Ir reduces costs like that of training  Steps :  Fourier transform of signal  Extraction of Phonemes  Formation of word on the basis of Markov Models  Charm of Simplicity  With the advent of this technology, we will hopefully see a new era of human computer interaction .
  • 28.
     From: Chapter1 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, by Daniel Jurafsky and James H. Martin  http://en.wikipedia.org/wiki/acoustic model  http://en.wikipedia.org/wiki/speech recognition  www.wikpedia.org  www.slideshare.net  Natural Language Processing by Rada Mihalcea  www.youtube.com