Complete power point presentation on SPEECH RECOGNITION TECHNOLOGY.
Very helpful for final year students for their seminar.
One can use this presentation as their final year seminar.
Speech Recognition is a very interesting topic for seminar.
speech recognition,History of speech recognition,what is speech recognition,Voice recognition software , Advantages and Disadvantages speech recognition, voice recognition,Voice recognition in operating systems ,Types of speech recognition
This report looks at the patenting activity around voice control of mobile devices and also captures key litigations and NPE’s operating in this area. Patents were categorized as per key algorithms and application areas and analyzed for generating different trends within PatSeer Project.
Voice input and speech recognition system in tourism/social mediacidroypaes
speech recognition softwares in tourism industry.a computer/mobile analysis of the human voice, especially for the purposes of interpreting words and phrases or identifying an individual voice.
This is a ppt on speech recognition system or automated speech recognition system. I hope that it would be helpful for all the people searching for a presentation on this technology
Complete power point presentation on SPEECH RECOGNITION TECHNOLOGY.
Very helpful for final year students for their seminar.
One can use this presentation as their final year seminar.
Speech Recognition is a very interesting topic for seminar.
speech recognition,History of speech recognition,what is speech recognition,Voice recognition software , Advantages and Disadvantages speech recognition, voice recognition,Voice recognition in operating systems ,Types of speech recognition
This report looks at the patenting activity around voice control of mobile devices and also captures key litigations and NPE’s operating in this area. Patents were categorized as per key algorithms and application areas and analyzed for generating different trends within PatSeer Project.
Voice input and speech recognition system in tourism/social mediacidroypaes
speech recognition softwares in tourism industry.a computer/mobile analysis of the human voice, especially for the purposes of interpreting words and phrases or identifying an individual voice.
This is a ppt on speech recognition system or automated speech recognition system. I hope that it would be helpful for all the people searching for a presentation on this technology
speech processing and recognition basic in data miningJimit Rupani
Basic presentation about speech processing
Name of the paper i read is :"An educational platform to demonstrate speech processing techniques on Android based smart phones and tablets" on Elsevier
Esophageal Speech Recognition using Artificial Neural Network (ANN)Saibur Rahman
Esophageal Speech Recognition using Artificial Neural Network (ANN). In our presentation shows that how to recognize normal speech and Esophageal speech using ANN. We compared our method with other methods and show that our method is better then other method.
Speech Recognition: Transcription and transformation of human speechSubmissionResearchpa
The specified subfield of computational linguistics and computer science can said to be linked with speech recognition. Speech recognition can develop new variation technologies as well as methodologies generated as interdisciplinary concept. It can be considered to translate and recognize and satisfy the capability towards understanding and translating the words that are already spoken. It is more preciously said that in the most recent times this field has secured positive feedback by intense learning of voice recognition. Such evidences shows the proof that it has more market demand for implementing the application of specific data as voice recognition. Deployment of speech recognition systems can be utilized as the evidence shown to its analyzing methods that is helpful for designing each and every individual’s future. It is said that the computer plays an important role for this process as by this all the translated words can be acknowledged by the texts also. Vishal Dineshkumar Soni 2019. Speech Recognition: Transcription and transformation of human speech . International Journal on Integrated Education. 2, 6 (Dec. 2019), 257-262. DOI:https://doi.org/10.31149/ijie.v2i6.497. Pdf Url : https://journals.researchparks.org/index.php/IJIE/article/view/497/478 Paper Url : https://journals.researchparks.org/index.php/IJIE/article/view/497
Speech recognition is the next big step that the technology needs to take for general users. An Automatic Speech Recognition (ASR) will play a major role in focusing new technology to users. Applications of ASR are speech to text conversion, voice input in aircraft, data entry, voice user interfaces such as voice dialing. Speech recognition involves extracting features from the input signal and classifying them to classes using pattern matching model. This can be done using feature extraction method. This paper involves a general study of automatic speech recognition and various methods to generate an ASR system. General techniques that can be used to implement an ASR includes artificial neural networks, Hidden Markov model, acoustic –phonetic approach
This presentation was delivered to a "Web Enabled Business" class at Simon Fraser University in Vancouver. The topic is speech recognition technology, and the presentation covers its origins, how it works, issues, latest trends and future opportunities.
This design consists of two communication protocols integrated in SoC and are UART & SPI. -Single bit input is provided for user to select either UART or SPI. In UART protocol TRANSMITTER passes the 32 bit data serially with parity bit to RECEIVER. In SPI protocol, RECEIVER passes the data to another RECEIVER which finally returns data back to TRANSMITTER and the TRANSMITTER passes data to TOP MODULE.
speech processing and recognition basic in data miningJimit Rupani
Basic presentation about speech processing
Name of the paper i read is :"An educational platform to demonstrate speech processing techniques on Android based smart phones and tablets" on Elsevier
Esophageal Speech Recognition using Artificial Neural Network (ANN)Saibur Rahman
Esophageal Speech Recognition using Artificial Neural Network (ANN). In our presentation shows that how to recognize normal speech and Esophageal speech using ANN. We compared our method with other methods and show that our method is better then other method.
Speech Recognition: Transcription and transformation of human speechSubmissionResearchpa
The specified subfield of computational linguistics and computer science can said to be linked with speech recognition. Speech recognition can develop new variation technologies as well as methodologies generated as interdisciplinary concept. It can be considered to translate and recognize and satisfy the capability towards understanding and translating the words that are already spoken. It is more preciously said that in the most recent times this field has secured positive feedback by intense learning of voice recognition. Such evidences shows the proof that it has more market demand for implementing the application of specific data as voice recognition. Deployment of speech recognition systems can be utilized as the evidence shown to its analyzing methods that is helpful for designing each and every individual’s future. It is said that the computer plays an important role for this process as by this all the translated words can be acknowledged by the texts also. Vishal Dineshkumar Soni 2019. Speech Recognition: Transcription and transformation of human speech . International Journal on Integrated Education. 2, 6 (Dec. 2019), 257-262. DOI:https://doi.org/10.31149/ijie.v2i6.497. Pdf Url : https://journals.researchparks.org/index.php/IJIE/article/view/497/478 Paper Url : https://journals.researchparks.org/index.php/IJIE/article/view/497
Speech recognition is the next big step that the technology needs to take for general users. An Automatic Speech Recognition (ASR) will play a major role in focusing new technology to users. Applications of ASR are speech to text conversion, voice input in aircraft, data entry, voice user interfaces such as voice dialing. Speech recognition involves extracting features from the input signal and classifying them to classes using pattern matching model. This can be done using feature extraction method. This paper involves a general study of automatic speech recognition and various methods to generate an ASR system. General techniques that can be used to implement an ASR includes artificial neural networks, Hidden Markov model, acoustic –phonetic approach
This presentation was delivered to a "Web Enabled Business" class at Simon Fraser University in Vancouver. The topic is speech recognition technology, and the presentation covers its origins, how it works, issues, latest trends and future opportunities.
This design consists of two communication protocols integrated in SoC and are UART & SPI. -Single bit input is provided for user to select either UART or SPI. In UART protocol TRANSMITTER passes the 32 bit data serially with parity bit to RECEIVER. In SPI protocol, RECEIVER passes the data to another RECEIVER which finally returns data back to TRANSMITTER and the TRANSMITTER passes data to TOP MODULE.
AUTOMATIC VOLTAGE CONTROL OF TRANSFORMER USING MICROCONTROLLER AND SCADA POWE...Ajesh Jacob
OBJECTIVES: The original system can afford the following features:
- Complete information about the plant (circuit breakers status, source of feeding, and level of the consumed power).
- Information about the operating values of the voltage, operating values of the transformers, operating values of the medium voltage, load feeders, operating values of the generators. These values will assist in getting any action to return the plant to its normal operation by minimum costs.
- Information about the quality of the system (harmonics, current, voltages, power factors, flickers, etc.). These values will be very essential in case of future correction.
- Recorded information such case voltage spikes, reducing the voltage on the medium or current interruption.
- implementation of µC based tap changer control practically, using special purpose digital hardware as a built-in semiconductor chip or software simulation in conventional computers.
مجموعه آموزش های برنامه نویسی آردوینو با محوریت پروژه های رباتیکfaradars
اگر مدتهاست علاقه مند به یادگیری یک زبان برنامه نویسی ساده و در عین حال قدرتمند برای ساخت ایده های خلاقانه خود هستید، پیشنهاد مجموعه آموزش های برنامه نویسی آردوینو ( Arduino) با محوریت پروژه های رباتیک را از دست ندهید.
سرفصل هایی که در این آموزش به آن پرداخته شده است:
- آردوینو چیست
- از کجا شروع کنیم
- برنامه نویسی اولین پروژه در آردوینو
- معرفی دستورات زبان c برای برنامه نویسی (از قبیل تعریف متغیر، دستورات شرطی، اتصال کلید و ..)
- معرفی سنسورهای کاربردی
- نحوه اتصال آن برد آردوینو و برنامه نویسی آن
- نحوه اتصال موتور های DC و کنترل آن از طریق آردینو
- اتصال LCD های کاراکتری معمولی و برنامه نویسی آن
- آموزش اتصال بلندگو و دیگر ماژول های صوتی جهت ایجاد صدا برای ربات
- بندی مباحث با ایجاد یک برنامه نویسی واحد برای ساخت یک ربات
- رفع مشکلات احتمالی و عیب یابی ربات
- توضیح و مقدمه ای بر قسمت بعدی آموزش (سطح پیشرفته) و یا نحوه ارتقای ایده های فردی
برای توضیحات بیشتر و تهیه این آموزش لطفا به لینک زیر مراجعه بفرمائید:
http://faradars.org/fvrd9311
GSM BASED PREPAID ENERGY METER BILLING VIA SMSSRINIVAS REDDY
The project is designed for reading electrical energy consumed in units and in rupees to display on an LCD screen to the user. This data is also provided to the electrical department using GSM technology for billing purposes. Owing to high electricity cost these days it becomes necessary for the consumer to know as to how much electricity is consumed to control electricity bill within his budget by recharging the energy meter units via S.M.S .
Finally when the energy meter coming to zero user can again recharge according to the purpose used. In this proposed system, the consumer will get his energy consumption data on real time basis on a LCD display. The same data is sent through GSM modem to the electricity department via SMS. A microcontroller of 8051 family is interfaced to the energy meter to get the Watt Hour pulses.
Further this project can be enhanced by to control the electrical appliances remotely via SMS. Also, the electricity department can send the monthly bill amount over SMS to the receiving unit for consumer information.
Artificial Intelligence - An Introduction acemindia
Artificial Intelligence is composed of two words Artificial and Intelligence, where Artificial defines "man made," and intelligence defines "thinking power", hence AI means "a man-made thinking power.“
Artificial Intelligence exists when a machine can have human based skills such as learning, reasoning, and solving problems.
Artificial Intelligence is composed of two words Artificial and Intelligence, where Artificial defines "man-made," and intelligence defines "thinking power", hence AI means "a man-made thinking power.“
Mini-Project – EECE 365, Spring 2021 You are to read an IlonaThornburg83
Mini-Project – EECE 365, Spring 2021
You are to read an EECE 365-related IEEE article (possibly 2-3 articles if they are on the shorter side), and put together
5-7 slides as if you were to give a 5-8 minute presentation (which you may be asked to give to Amin later in the
semester).
Sample articles (that you may select) have been posted on the course website. As no more than 4 people may select the
same article(s), a sign-up sheet for the sample articles can be found here (to gain access, you may need to first log into
your Wildcat Mail account):
https://docs.google.com/spreadsheets/d/1QTIbVVEUQEGXOMq-brvqAkS5aTDwKccYafpgB49B_ec/edit?usp=sharing
Available articles have been marked in green in the document, and the document will be regularly updated. To
“claim” sample article(s), you MUST email Amin. You are NOT confirmed until you receive a confirmation message
from Amin. Available slots will be filled on a first-come, first-served basis. To minimize back-and-forth emails,
you are encouraged to email Amin with a list of your interested articles, ranked in order of preference.
Note: You are welcome to find other appropriate IEEE article using IEEE Xplore (
https://ieeexplore.ieee.org/Xplore/home.jsp ). Recommended publications include IEEE Signal Processing Magazine,
IEEE Engineering in Medicine and Biology Magazine, IEEE Communications Magazine, or IEEE Spectrum (among
others). If you prefer to pick an article different from the samples posted on the course website, you must first
obtain Amin’s okay for the article.
Use the following questions to guide you through the paper reading process (and subsequently putting together your
slides):
x What are the motivations for the article?
x What is the article’s evaluation of the problem and/or proposed solution?
x What is your analysis of the application of spectrograms to the problem, idea, and/or solution described in the
article?
x What are possible future directions for the topic described in the article?
x What questions are you left with?
The content of your slides should progress/flow in a logical manner, with each point leading to the next in a “smooth”
manner. You should NOT be copying/pasting text from the article to your slides. The text-based content of your slides
should be in your own words, based upon your interpretations/understandings of the article (it is okay if you do not
fully grasp/follow the mathematical equations/proofs in the article). The important thing is for your slides (and
possible presentation to Amin) to convey the “bigger picture” takeaways of the article (in your own words). In addition
to text, you should include appropriate pictures/images in your slides. Pictures that you did not create should be
cited/credited.
This is an “individual” assignment. That is, the content/information contained in one’s slides must be developed/put
together by each individual.
https://docs.google.com/spreadsheets/d ...
Voice Recognition System using Template MatchingIJORCS
It is easy for human to recognize familiar voice but using computer programs to identify a voice when compared with others is a herculean task. This is due to the problem that is encountered when developing the algorithm to recognize human voice. It is impossible to say a word the same way in two different occasions. Human speech analysis by computer gives different interpretation based on varying speed of speech delivery. This research paper gives detail description of the process behind implementation of an effective voice recognition algorithm. The algorithm utilize discrete Fourier transform to compare the frequency spectra of two voice samples because it remained unchanged as speech is slightly varied. Chebyshev inequality is then used to determine whether the two voices came from the same person. The algorithm is implemented and tested using MATLAB.
An Introduction to Various Features of Speech SignalSpeech featuresSivaranjan Goswami
An overview of various temporal, spectral and cepstral features of speech signal used in digital speech processing.
For more tutorials visit:
https://sites.google.com/site/enggprojectece
Developing a hands-free interface to operate a Computer using voice commandMohammad Liton Hossain
The main focus of this study is to help a handicap person to operate a computer by voice command. It can be used to operate the entire computer functions on the user’s voice commands. It makes use of the Speech Recognition technology that allows the computer system to identify and recognize words spoken by a human using a microphone. This Software will be able to recognize spoken words and enable user to interact with the computer. This interaction includes user giving commands to his computer which will then respond by performing several tasks, actions or operations depending on the commands they gave. For Example: Opening /closing a file in computer, YouTube automation using voice command, Google search using voice command, make a note using voice command, calculation by calculator using voice command etc.
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
The Roman Empire A Historical Colossus.pdfkaushalkr1407
The Roman Empire, a vast and enduring power, stands as one of history's most remarkable civilizations, leaving an indelible imprint on the world. It emerged from the Roman Republic, transitioning into an imperial powerhouse under the leadership of Augustus Caesar in 27 BCE. This transformation marked the beginning of an era defined by unprecedented territorial expansion, architectural marvels, and profound cultural influence.
The empire's roots lie in the city of Rome, founded, according to legend, by Romulus in 753 BCE. Over centuries, Rome evolved from a small settlement to a formidable republic, characterized by a complex political system with elected officials and checks on power. However, internal strife, class conflicts, and military ambitions paved the way for the end of the Republic. Julius Caesar’s dictatorship and subsequent assassination in 44 BCE created a power vacuum, leading to a civil war. Octavian, later Augustus, emerged victorious, heralding the Roman Empire’s birth.
Under Augustus, the empire experienced the Pax Romana, a 200-year period of relative peace and stability. Augustus reformed the military, established efficient administrative systems, and initiated grand construction projects. The empire's borders expanded, encompassing territories from Britain to Egypt and from Spain to the Euphrates. Roman legions, renowned for their discipline and engineering prowess, secured and maintained these vast territories, building roads, fortifications, and cities that facilitated control and integration.
The Roman Empire’s society was hierarchical, with a rigid class system. At the top were the patricians, wealthy elites who held significant political power. Below them were the plebeians, free citizens with limited political influence, and the vast numbers of slaves who formed the backbone of the economy. The family unit was central, governed by the paterfamilias, the male head who held absolute authority.
Culturally, the Romans were eclectic, absorbing and adapting elements from the civilizations they encountered, particularly the Greeks. Roman art, literature, and philosophy reflected this synthesis, creating a rich cultural tapestry. Latin, the Roman language, became the lingua franca of the Western world, influencing numerous modern languages.
Roman architecture and engineering achievements were monumental. They perfected the arch, vault, and dome, constructing enduring structures like the Colosseum, Pantheon, and aqueducts. These engineering marvels not only showcased Roman ingenuity but also served practical purposes, from public entertainment to water supply.
Introduction to AI for Nonprofits with Tapp NetworkTechSoup
Dive into the world of AI! Experts Jon Hill and Tareq Monaur will guide you through AI's role in enhancing nonprofit websites and basic marketing strategies, making it easy to understand and apply.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
2. www.studymsfis.org
Preface
have made this report file on the topic SPEECH RECOGNITION; I have tried my best to
elucidate all the relevant detail to the topic to be included in the report. While in the beginning I
have tried to give a general view about this topic.
My efforts and wholehearted co-corporation of each and everyone has ended on a successful
note. I express my sincere gratitude to …………..who assisting me throughout the preparation
of this topic. I thank him for providing me the reinforcement, confidence and most importantly
the track for the topic whenever I needed it.
3. www.studymsfis.org
Acknowledgement
I would like to thank respected Mr…….. and Mr. ……..for giving me such a wonderful
opportunity to expand my knowledge for my own branch and giving me guidelines to present a
seminar report. It helped me a lot to realize of what we study for.
Secondly, I would like to thank my parents who patiently helped me as i went through my work
and helped to modify and eliminate some of the irrelevant or un-necessary stuffs.
Thirdly, I would like to thank my friends who helped me to make my work more organized and
well-stacked till the end.
Next, I would thank Microsoft for developing such a wonderful tool like MS Word. It helped
my work a lot to remain error-free.
Last but clearly not the least, I would thank The Almighty for giving me strength to complete
my report on time.
4. www.studymsfis.org
Content
1. Introduction
2. Speech Recognition
3. How does the ASR Technology Works?
4. Speech Recognition Process
5. Variation in Speech
6. Classification of ASR System
7. Vocabularies for computers
8. Which system to choose
9. Advantages of speaker independent system
10. Advantages of speaker dependent system
11. Techniques in Vogue
12. Benefits
13. Drawbacks
14. Conclusion
5. www.studymsfis.org
INTRODUCTION
The computer revolution is now well advanced, but although we see a starting
preparation of computer machines in many forms of work people do, the domain of computers is
still significantly small because of the specialized training needed to use them and the lack of
intelligence in computer systems. In the history of computer science five generations have passed
by, each adding a new innovative technology that brought computers nearer and nearer to the
people. Now it is sixth generation, whose prime objective is to make computers more intelligent
i.e., to make computer systems that can think as humans. The fifth generation was aimed at using
conventional symbolic Artificial Intelligence techniques to achieve machine intelligence. Thus
failed. Statistical modeling and Neural Nets are really sixth generation. The goal of work in
Artificial Intelligence is to build the machines that perform tasks normally requiring human
intelligence. True, but speech recognition seeing and walking don’t require “intelligence, but
human perceptual ability and motor control. Speech Technology is now one of the major
significant scientific research fields under the broad domain of AI; indeed it is a major co-
domain of computer science, apart from the traditional linguistics and other disciplines that study
the spoken language.
6. www.studymsfis.org
SPEECH RECOGNITION
The days when you had to keep staring at the computer screen and frantically hit the key
or click the mouse for the computer to respond to your commands may soon be a things of past.
Today we can stretch out and relax and tell your computer to do your bidding. This has been
made possible by the ASR (Automatic Speech Recognition) technology.
The ASR technology would be particularly welcome by automated telephone exchange
operators, doctors and lawyers, besides others whose seek freedom from tiresome conventional
computer operations using keyboard and the mouse. It is suitable for applications in which
computers are used to provide routine information and services. The ASR’s direct speech to text
dictation offers a significant advantage over traditional transcriptions. With further refinement of
the technology in text will become a thing of past. ASR offers a solution to this fatigue-causing
procedure by converting speech in to text.
The ASR technology is presently capable achieving recognition accuracies of 95% - 98
% but only under ideal conditions. The technology is still far from perfect in the uncontrolled
real world. The routes of this technology can be traced to 1968 when the term Information
Technology hadn’t even been coined. American’s had only begun to realize the vast potential of
computers. In the Hollywood blockbuster 2001: a space odyssey. A talking listening computer
HAL-9000, had been featured which to date is a called figure in both science fiction and in the
world of computing. Even today almost every speech recognition technologist dreams of
designing an HAL-like computer with a clear voice and the ability to understand normal speech.
Though the ASR technology is still not as versatile as the imaginer HAL, it can nevertheless be
used to make life easier. New application specific standard products, interactive error-recovery
techniques, and better voice activated user interfaces allow the handicapped, computer-illiterate,
and rotary dial phone owners to talk to the computers. ASR by offering a natural human interface
7. www.studymsfis.org
to computers, finds applications in telephone-call centers, such as for airline flight information
system, learning devices, toys, etc..
HOW DOES THE ASR TECHNOLOGY WORK?
When a person speaks, compressed air from the lungs is forced through the vocal tract as
a sound wave that varies as per the variations in the lung pressure and the vocal tract. This
acoustic wave is interpreted as speech when it falls upon a person’s ear. In any machine that
records or transmits human voice, the sound wave is converted into an electrical analogue signal
using a microphone. When we speak into a telephone receiver, for instance, its microphone
converts the acoustic wave into an electrical analogue signal that is transmitted through the
telephone network. The electrical signals strength from the microphone varies in amplitude over
time and is referred to as an analogue signal or an analogue waveform. If the signal results from
speech, it is known as a speech waveform. Speech waveforms have the characteristic of being
continuous in both time and amplitude.
THE ROADTO HAL
Person
speaks
“THE ROAD
TO HAL”
THE
ROAD TO
HAL
8. www.studymsfis.org
TWO CAL
A
listener’s ears and brain receive and process the analogue speech waveforms to figure out
the speech. ASR enabled computers, too, work under the same principle by picking up acoustic
cues for speech analysis and synthesis. Because it helps to understand the ASR technology
better, let us dwell a little more on the acoustic process of the human articulator system. In the
vocal tract the process begins from the lungs. The variations in air pressure cause vibrations in
the folds of skin that constitute the vocal chords. The elongated orifice between the vocal chords
is called the glottis. As a result of the vibrations, repeated bursts of compressed air are released
in to the air as sound waves.
Articulators in the vocal tract are manipulated by the speaker to produce various effects.
The vocal chords can be stiffened or relaxed to modify the rate of vibration, or they can be
turned off and the vibration eliminated while still allowing air to pass. The velum acts as a gate
ELECTRONIC
AL SIGNAL
INTO THE
COMPUTER
BACKGRRE
MOVAL AND
SOUND
AMPLIFICOU
ND NOISE
ATION
BREAK UP
WORDS
INTO
PHONEMES
LANGUAE
ANALYSIS
MATCHING AND CHOSING THE
RIGHT CHARACTER COMBINATION
PER
THE
Road
Load
Moo
To
Boo
Gall
Mall
9. www.studymsfis.org
between the oral and the nasal cavities. It can be closed to isolate or opened to couple the two
cavities. The tongue, jaw, teeth, and lips can be move to change the shape of the oral cavity.
The nature of sound preserve wave radiating out world from the lips depends upon this
time varying articulations and upon the absorptive qualities of the vocal tracts materials. The
sound pressure wave exists as a continually moving disturbance of air. Particles come move
closer together as the pressure increases or move further apart as it decreases, each influencing
its neighbor in turn as the wave propagates at the speed of sound. The amplitude to the wave at
any position, distant from the speaker, is measured by the density of air molecules and grows
weaker as the distance increases. When this wave falls upon the ear it is interpreted as sound
with discernible timbre, pitch, and loudness.
Air under pressure from the lung moves through the vocal tract and comes into contact
with various obstructions including palate, tongue, teeth, lips and timings. Some of its energy is
absorbed by these obstructions; most is reflected. Reflections occur in all directions so that parts
of waves bounce around inside the cavities for some time, blending with other waves, dissipating
energy and finally finding the way out through the nostrils or past the lips.
Some waves resonate inside the tract according to their frequency and the cavity’s shape
at that moment, combining with other reflections, reinforcing the wave energy before exiting.
Energy in waves of other, non-resonant frequencies is attenuated rather than amplified in its
passage through the tract.
A speech recognitions system involves five major steps :
You speak into a microphone, which converts sound waves into electrical signals.The
ASR program removes all noise and retains only the words that you have spoken. The words are
broken down into individual sounds, known as phonemes, which are the smallest sound units
discernible.
10. www.studymsfis.org
In the next most complex part, the program’s database maps sounds to character groups.
The program also has a big dictionary of popular words that exist in the language. Next, each
phoneme is matched against the sounds and converted into the appropriate character group. This
is where problems begin. To overcome the difficulties encountered in this phase, the program
uses numerous methods; firs, it checks and compares words that are similar in sound with what
they have heard; then it follows a system of language analyses to check if the language allows a
particular syllable to appear after another.
Then comes the grammar and language check. It tries to find out whether or not the
combination of word makes any senses. This is very similar to the grammar-check package that
you find in word processors.
The numerous words constituting the speech are finally noted down in the word.
Processor. While all speech recognition programs come with their own word processors, some
can work with other word processing packages like MS word and Word Perfect. In fact, OS/2
allows even operating command to be spoken.
11. www.studymsfis.org
THE SPEECH RECOGNITION PROCESS
When a person speaks, compressed air from the lungs is forced through the vocal tract as
a sound wave that varies as per the variations in the lung pressure and the vocal tract. This
acoustic wave is interpreted as speech when it falls up on a person’s ear. Speech waveforms
have the characteristic of being continuous in both time and amplitude.
Any speech recognition system involves five major steps:
1. Converting sounds into electrical signals: when we speak into microphone it converts
sound waves into electrical signals. In any machine that records or transmits human
voice, the sound wave is converted into an electrical signal using a microphone. When we
speak into telephone receiver, for instance, its microphone converts the acoustic wave
into an electrical analogue signal that is transmitted through the telephone network. The
electrical signal’s strength from the microphone varies in amplitude overtime and is
referred to as an analogue signal or an analogue waveform.
2. Background noise removal: the ASR programs removes all noise and retains the words
that you have spoken.
3. Breaking up words into phonemes: The words are broken down into individual sounds,
known as phonemes, which are the smallest sound units discernible. For each small
amount of time, some feature, value is found out in the wave. Likewise, the wave is
divided into small parts, called Phonemes.
4. Matching and choosing character combination: this is the most complex phase. The
program has big dictionary of popular words that exist in the language. Each Phoneme is
matched against the sounds and converted into appropriate character group. This is where
problem begins. It checks and compares words that are similar in sound with what they
have heard. All these similar words are collected.
5. Language analysis: here it checks if the language allows a particular syllable to appear
after another.
6. After that, there will be grammar check. It tries to find out whether or not the
combination of words any sense. That is there will be a grammar check package.
7. Finally the numerous words constitution the speech recognition programs come with their
own word processor, some can work with other word processing package like MS word
and word perfect.
12. www.studymsfis.org
VARIATIONS IN SPEECH
The speech-recognition process is complicated because the production of phonemes and
the transition between them varies from person to person and even the same person. Different
people speak differently. Accents, regional dialects, sex, age, speech impediments, emotional
state, and other factors cause people to pronounce the same word in different ways. Phonemes
are added, omitted, or substituted. For example, the word, America, is pronounced in parts of
New England as America. The rate of speech also varies from person the person depending upon
a person’s habit and his regional background.
A word or a phrase spoken by the same individual differs from moment to moment
illness; tiredness, stress or other conditions cause subtle variations in the way a word is spoken at
different times. Also, the voice quality varies in accordance with the position of the person
relative to the microphone, the acoustic nature of the surroundings, or the quality of the
recording devices. The resulting changes in the waveform can drastically affect the performance
of the recognizer.
13. www.studymsfis.org
Classification of speech recognition system
When a speech recognition system requires a word to be spoken individually, in isolation
from other words, it is said to be an isolated word system, it recognizes discrete words only.
When they are separated from their neighbors by distinct inter word poses. So the user has to
pause before uttering the next word. A continuous word system on the other hand is capable of
handling more fluent speeches. While a big vocabulary system works with a vocabulary base joy
minimum thousand words a small vocabulary system works with a maximum thousand words.
VOCABULARIES FOR COMPUTERS
Each ASR system has LAN active vocabulary- a set of words from which the recognition
engine tries to make senses of utterance- and a total vocabulary size-the total number of words in
all possible sets that can be culled from the memory.
The vocabulary size and a system recognition latency- the allowable time to accurately
Recognize an utterance determining the process horsepower of the recognition engine.
An active vocabulary set comprises approximately fourteen words plus none of the
above, who the recognizer chooses when the none of the fourteen words is good mach .The
recognition latency when using a 4-MIPS processor, is about.5 seconds for a second for
independent set. Processing power requirements increased dramatically for LVR sets with
thousands of words. Real time latencies with a vocabulary base of few thousands are possible
only through the use of Pentium class processors. A small active vocabulary limits a system
search range providing advantages in latency and search time. A large total vocabulary enables
more versatile human interface but affects system memory requirements. A system with a small
active vocabulary with each prompt usually provides faster more accurate results, similar
sounding words in vocabulary set cause recognition errors. But a unique sound for each word
enhances recognition engines accuracy.
14. www.studymsfis.org
WHICH SYSTEM TO CHOOSE
In choosing a speech recognition system you should consider the degree of speak
independence it offers. Speaker independent systems can provide high recognition accuracies for
a wide range of users without needing to adapt to each user’s voice. Speaker dependent systems
require that to train the system to your voice to attain high accuracy. Speaker adaptive systems
an intermediate category are essentially speaker-independent but can adapt their templates for
each user to improve accuracy.
15. www.studymsfis.org
ADVANTAGES OF SPEAKER INDEPENDENT SYSTEM
The advantage of a speaker independent system is obvious-anyone can use the system
without first training it. However, its drawbacks are not so obvious. One limitation is the work
that goes into creating the vocabulary templates. To create reliable speaker-independents
templates, someone must collect and process numerous speech sample. This is a time-consuming
task; creating these templates is not a one-time effort. Speaker-independent templates are
language-dependant, and the templates are sensitive not only to two dissimilar languages but also
to the differences between British and American English. Therefore, as part of your design
activity, you would need to create a set of templates for each language or a major dialect that
your customers use. Speaker independent systems also have a relatively fixed vocabulary
because of the difficulty in creating a new template in the field at the user’s site.
THE ADVANTAGE OF A SPEAKER-DEPENDENT SYSTEM
A speaker dependent system requires the user a train the ASR system by providing
examples of his own speech. Training can be tedious process, but the system has the advantage
of using templates that refer only to the specific user and not some vague average voice. The
result is language independence. You can say ja, si, or ya during training, as long as you are
consistent. The drawback is that the speaker-dependent system must do more than simply match
incoming speech to the templates. It must also include resources to create those templates.
WHICH IS BETTER
For a given amount of processing power, a speaker dependent system tends to provide
more accurate recognition than a speaker-independent system. A speaker independent system is
not necessarily better: the difference in performance stems from the speaker independent
template encompassing wide speech variations.
16. www.studymsfis.org
TECHNIQUES IN VOGUE
The most frequently used speech recognition technique involves template matching, in
which vocabulary words are characterized in memory a template time based sequences of
spectral information taken from waveforms obtained during training.
As an alternative to template matching, feature based designs have been used in which a
time sequence of the pertinent phonetic features is extracted from a speech waveform. Different
modeling approaches are used, but models involving state diagrams have been fond to give
encouraging performance. In particular, HMM (hidden Markov models) are frequently applied.
With HMMs any speech unit can be modeled, and all knowledge sources can be modeled, and all
knowledge sources and be included in a single, integrated model. Various types of HMMs have
been implemented with differing results. Some model each word in the vocabulary, while others
model sub word speech units.
HIDDEN MARKOV MODEL:
HMM can be used to model an unknown process that produces a sequence of observable
outputs at discrete intervals, where the outputs are members of some finite alphabet. These
models are called hidden Markov models precisely because the state sequence that produced the
observable output is not known- its hidden.
HMM is represented by a set of states, vectors defining transitions between certain pairs
of those states, probabilities that apply to state to state transitions, sets of probabilities
characterizing observed output symbols, and initial conditions.
An example is shown in the three state diagram 3 where states are denoted by nodes and
transitions by directed arrows (vectors) between nodes. The underlying model is a markov chain.
The circles represent states of the speaker’s vocal system specific configuration of tongue, lips,
etc that produce a given sound. The arrows represent possible transitions from one state to
another. At any given time, the model is said to be in one state. At clock time, the model might
change from its current state to any state to any state for which a transition vector exists.
17. www.studymsfis.org
Transition may occur only from the tail to the head of a vector. A state can have more than one
transition leaving it and more than one leading to it.
WHAT LIES AHEAD?
If we were to draw a path that told us where speech and computers go from hereon, we
would see that we have only moved two steps.
In the first, we created systems that would be able to listen to sounds and convert them
into words. That was the era of discrete speech programs that listen to each distinct word spoken
by you and then put them down. Though slow and tedious, it announced the arrival of speech
technologies on the desktop.
In the second step, we got continuous speech recognition. You no longer have to speak
out each word separately instead you can just talk naturally and the program understands what
you say.
The third step s speech understanding. This technology is the one that will actually mark
the biggest change in the way we use our computers. It will necessitate a complete overhaul of
our operation system, our word processors, our word processors, out spreadsheets just about
everything. It will also mark the emergence of a computer like HAL. When speech
18. www.studymsfis.org
understanding arrives in its true form, it will allow your computer to make sense of commands
like ‘wake me up at six in the morning, you ghonchu.’ Till then we must only wait and watch.
NEED FOR BUILDING AN IDEAL SYSTEM:
Speech recognition systems would ideally have to recognize continuous words in a given
language, spoken by any person, 100 percent of the time. This is better than most humans can do,
especially given the variety of different accents which people have. The accuracy of speech
recognition systems grows and matures with time.
The best speech recognition systems today generally either can recognize small
vocabularies over a diverse group of speakers, or large vocabularies for only one speaker. A
speaker dependent system can be custom built for a single speaker to recognize isolated words or
continuous speech. Clearly, the ideal system would recognize continuous speech over a large
vocabulary and be speaker independent. Unfortunately, we have no yet succeeded in building
such a system. Currently, speech recognition systems are bet at recognizing long words with
many distinct features. The easiest systems to build are speaker dependent, and isolated word
recognition systems. With a drop in the equipment cost and increase in the processing speed of
processors, speech recognition products will soon be playing a major role in more demanding
jobs.
EVALUATION
Funders have been very keen on competitive quantitative evaluation
Subjective evaluations are informative, but not cost-effective
For transcription tasks, word-error rate is popular (though can be misleading: all words
are not equally important)
For task-based dialogues, other measures of understanding are needed
19. www.studymsfis.org
REMAINING PROBLEMS
Robustness – graceful degradation, not catastrophic failure
Portability – independence of computing platform
Adaptability – to changing conditions (different mic, background noise, new speaker,
new task domain, new language even)
Language Modelling – is there a role for linguistics in improving the language models?
Confidence Measures – better methods to evaluate the absolute correctness of
hypotheses.
Out-of-Vocabulary (OOV) Words – Systems must have some method of detecting OOV
words, and dealing with them in a sensible way.
Spontaneous Speech – disfluencies (filled pauses, false starts, hesitations, ungrammatical
constructions etc) remain a problem.
Prosody –Stress, intonation, and rhythm convey important information for word
recognition and the user's intentions (e.g., sarcasm, anger)
Accent, dialect and mixed language – non-native speech is a huge problem, especially
where code-switching is commonplace
20. www.studymsfis.org
BENIFITS
Security:
With this technology a powerful interface between man and computer is created as the
voice reorganization understands only the prerecorded voices and hence there are no ways of
tampering data or breaking the codes if created.
Productivity:
It decreases work as all operations are done through voice recognition and hence paper
work decreases to its maximum and the user can feel relaxed irrespective of the work.
Advantage for handicapped and blind:
This technology is great boon for blind and handicapped as they can utilize the voice
recognition technology for their works.
Usability of other languages increases.
As the speech recognition technology needs only voice and irrespective of the language
in which it is delivered it is recorded, due to this perspective this is helpful to be used in any
language.
Personal voice macros can be created.
Every day tasks like sending mails receiving mails drafting documents can be done easily
and also many tasks speed can be increased.
21. www.studymsfis.org
DRAWBACKS
If the system has to work under noisy environments, background noise may corrupt the
original data and leads to SSmisinterpretation.
If words that are pronounced similar for example, their, there, this technology face
difficulty in distinguishing them.
22. www.studymsfis.org
CONCLUSION
Voice recognition promises rosy future and offer wide variety of services. The next
generation of voice recognition technology consists of something called Neural networks using
artificial intelligence technology. They are formed by interconnected nodes which do parallel
processing of the input for fast evaluation. Like human beings they learn new pattern of speech
automatically.