The document discusses a speech recognition system, detailing its components, working principles, and applications in technology. It explains the extraction of phonemes, the use of Markov models for word formation, and highlights the significance of fundamental linguistic concepts like phonetics and semantics. The system is noted to operate effectively on specific versions of Windows and supports various fields, including robotics and natural language processing.
The presentation is submitted to Mr. Abhishek Srivastava by Vikalp Mahendra, outlining key topics in speech recognition and related components.
Discusses voice recognition systems analyzing sound to convert speech to text, primarily on Windows XP and Vista, highlighting linkages to databases and NLP.
Explains linguistic levels: phonology, morphology, syntax, and semantics, elaborating on phonetics and the human production of sounds.
Explores vowel sounds, including monophthongs, diphthongs, and triphthongs, detailing their articulation and characteristics.
Describes audio recording methods for creating a speech recognition engine, phoneme extraction via Fourier transform, and spectral analysis.
Details the use of the Markov model to convert phonemes into words, improving recognition through context-based analysis.
Highlights applications including telephones, military, smartphones, and speech-controlled appliances, emphasizing cost reduction and future of human-computer interaction.
Lists references and resources related to speech recognition and natural language processing.
Introduction
Block Diagram
Linguistic Levels Of Analysis
Phonetics
Organs Of Speech And Articulation
Acoustic Model
Circuit Diagram
Components Used
Features Of HM2007
Working
Extracting Phonemes In Frequency Domain
Markov Model
Advantages
Applications
Conclusion
3.
Analyses soundand converts spoken word into text
Uses knowledge of spoken English
Programs are available for voice recognition.
Systems work best on Windows XP & Windows Vista
4.
Computers
Databases Algorithms
Robotics Natural Language Processing Search
Information
Retrieval
Machine
Translation
Language
Analysis
Semantics
5.
Speech
Written language
Phonology: sounds / letters / pronunciation
Morphology: the structure of words
Syntax: how these sequences are structured
Semantics: meaning of the strings
6.
The Study ofthe way Humans make, Transmit, and
receive sounds
Phonology - the study of sound systems of languages
A typical word such as moon broken down into
three phonemes: m, ue , n.
Phoneme represents all vowels and consonants of
spoken speech
8.
Most vowelsounds are modified by the shape of
the lips (rounded / spread / neutral)
Sounds are made by vibrating the vocal cords
(voicing)
Vowels can be :-
Single sounds – Monophthongs or pure vowels
Double sounds - Diphthongs
Triple sounds - Triphthongs
Pure vowels usually come in pairs consisting of
long and short sounds
9.
This is foundin the word tea. The lips are spread and the sound is long.
This is found in the word hip. The lips are slightly spread and the sound is short.
The tongue tip is raised slightly at the front towards the alveolar. In the longer sound the
tongue is raised higher.
10.
This soundis made by relaxing the mouth and
keeping your lips in a neutral position and making
a short sound. It is found in words like paper,
over, about, and common in weak verbs in spoken
English.
11.
The long sound– you, too & blue
The short sound –Good, would &
wool
The lips are rounded and the centre
and back of the tongue is raised towards
the soft plate. For the longer sound the
tongue is raised higher and the lips are
more rounded.
This sound is made with the mouth
spread wide open. It is found in – cat,
man, apple & ran
12.
Here wehave three sounds: The sounds from -
1) for 2) tour 3) go
Triphthongs are combinations of three sounds-
English has 1 triphthong (a diphthong + a
schwa sound)
Diphthongs are combinations of two sounds.
13.
Diphthongs are combinationsof pure vowels.
•a:+ I = ‘aI’ - tie, buy, height & night
•e + I = ‘eI’ -way, paid & gate
•o: + I = ‘oI’ – boy, coin & coy
•e + = e - where, hair & care
• I + = I - here, hear & beer
e e
e e
14.
The audiorecording of speech to create a
statistical representation of sound.
To create a speech recognition engine, a large
database of models is created to match each
phoneme
These database models have stored
phonemes
The language model has the grammar of the
sentence to decode our spoken word to text.
A singlechip voice recognition system
having 48 pin .
Manufactured by Hualon
Maximum 40 word and word length 1.92 sec
Microphone support
5V power supply
19.
How acomputer convert spoken speech into data ??
When we speak, a microphone converts the analog signal of our voice into
digital chunks of data that the computer analyzes.
It is from this data that the computer extracts enough information that
confidently guess the word being spoken
20.
To extractphonemes
Phonemes are linguistic units
The sounds that group together form words
Phoneme converts into sound & depends on many factors
21.
aa -father
ae - cat
ah - cut
ao - dog
aw - foul
ng - sing
t - talk
th - thin
uh - book
waveform shows
phonemes freq
characteristics
22.
Phonemes areextracted by running waveform through Fourier
transform
Easily visible in frequency domain
This can be make out by seeing spectrograph
Spectrograph is a 3-D plot of waveform freq and amplitude
versus time and amplitude is shown in grey colour
23.
Computer generateslist of phoneme
These phoneme have to be converted into words and to
sentence so Markov model is used
It compares the observed phoneme with the stored phoneme
24.
In this,word tomato is written both in English and American
English format
This idea is used upto the level of sentences and improved
recognition
25.
It isused to translate different form of language
It Is used in telephones
The std land line telephone has a bandwidth of 64kb/s.
Sampling rate of 8khz
In Std desktop P.C ,the limiting factor is sound card.It can
record sampling rate between 16 kHz to 48 kHz
26.
MILITARY
HELICOPTERS
IN MOBILE SMARTPHONES`
SPEECH CONTROLLED
APPLIANCES
VOICE RECOGNITION SECURITY
27.
Speech recognitionsystem is one of the latest technology .
Ir reduces costs like that of training
Steps :
Fourier transform of signal
Extraction of Phonemes
Formation of word on the basis of Markov Models
Charm of Simplicity
With the advent of this technology, we will hopefully see a
new era of human computer interaction .
28.
From: Chapter1 of An Introduction to Natural Language
Processing, Computational Linguistics, and Speech
Recognition, by Daniel Jurafsky and James H. Martin
http://en.wikipedia.org/wiki/acoustic model
http://en.wikipedia.org/wiki/speech recognition
www.wikpedia.org
www.slideshare.net
Natural Language Processing by Rada Mihalcea
www.youtube.com