A
Seminar
ON
“AI For Speech Recognition”
Submitted in partial fulfilment for the award of Degree of Bachelor of Technology
Session: 2022-23
SUBMITTED TO:
Dr. Garima Mathur
(Head & Professor)
Guided By:
Dr. Payal Bansal
(Associate Professor)
SUBMITTED BY:
Akash Singh(PCE19EC005)
Department Of
Electronics and Communication Engineering
Poornima College of Engineering, Jaipur
CONTENT LIST
 Artificial Intelligence (or AI )
 Speech Recognition
 Speech Recognition in Cell Phones
 Performance of speech recognition systems
 HMM - based Speech Recognition
 DTW - based Speech Recognition
 Applications of Speech Recognition
 Failures of Speech Recognition
 Supporting Speech Recognition
 Conclusion
 Thank You
Artificial Intelligence (or AI )
The study and design of intelligent agents & also used to describe a property of machines or programs Among researchers hope machines
will exhibit are reasoning, knowledge, planning, learning, communication, perception and the ability to move and manipulate.
Applications of AI
 Pattern Recognition
 Hand Recognition
 Speech Recognition
 Natural Language
 Processing Face Recognition
 Artificial Creativity
 Non linear controls and Robotics
Speech Recognition
 Speech recognition converts spoken words to machine-readable input. It is also
called Voice Recognition.
Speech recognition include-
• voice dialing
• content-based spoken audio search
• speech-to-text processing
Audio visual Speech Recognition is also present in which it takes lip reading also
apart from speech recognition.
Speech Recognition in Cell phones
 Callers words are captured and digitized by speech-
recognition system.
 Digitized voice is split into individual frequency
components, called spectral representations .
 The components are translated into phonemes .
 Complex models and algorithms determine a likely
translation
Performance of speech recognition systems
 It is usually specified in terms of accuracy and speed. Accuracy may be measured in terms of
performance accuracy which is usually rated with word error rate , whereas speed is measured with
the real time factor.
 Dictation machines can achieve very high performance in controlled conditions and require only a
short period of training.
Optimal conditions usually assume that users-
• Have speech characteristics which match the training data.
• Can achieve proper speaker adaption.
• Work in clean and no noise environment.
There are 2 models on statistically- based Speech Recognition
 Hidden Markov Model (HMM model)
 Dynamic Time Wrapping (DTW model)
HMM - based Speech Recognition
These are statistical models which output a
sequence of symbols or quantities.
Two reasons why HMMs are mainly used and
popular-
 Speech signal could be viewed as a piecewise
stationary signal.
 They can be trained automatically , simple
and computationally feasible to use
DTW - based Speech Recognition
 Dynamic time warping is an algorithm for measuring similarity between two sequences
which may vary in time or speed. It is a historical approach.
Similarities between speaking patterns would be detected. DTW has been applied to video,
audio, and graphics -- indeed, any data which can be turned into a linear representation can
be analyzed with DTW .
 This sequence technique is also used in HMMs model.
Applications of Speech Recognition
 Health Care –
In this even in the wake of Speech recognition technologies MT haven’t become obsolute .
 Military-
High-performance fighter aircraft-Speech recognizers have been operated successfully
Some important conclusions from the work are as follows:
1. Speech recognition has definite potential for reducing pilot workload, but this potential was not realized
consistently.
2. 2. Achievement of very high recognition accuracy (95% or more) was the most critical factor for making the
speech recognition system useful - with lower recognition rates, pilots would not use the system.
3. 3. More natural vocabulary and grammar, and shorter training times would be useful, but only if very high
recognition rates could be maintained.
 Helicopters – As in fighter applications overriding issue for voice in helicopters is
the impact on pilot effectiveness.
 Battle Management – Speech recognition equipment was tested in conjunction
with an integrated information display for naval battle management applications.
 Telephony and other domains – ASR in the field of computer gaming and
simulation is becoming more widespread .
 Disabled people – These people are another part of population that benefit from
speech recognition programs.
Failures of Speech Recognition
 The computer has trouble with "sound-alike" errors. It's hard to get mad at the computer
for not recognizing mumbling. But it can be frustrating when you think you are speaking
clearly, and it just isn't good enough.
For example, when I said: I sure look forward to seeing you
The computer heard: Assure look forward to seen in you
When I repeated the same words with better enunciation, the computer got it right .
 Using Laptops On the Roads.
Supporting Speech Recognition
To be successful, most firms should-
 Set up a pilot group of patient lawyers to try it out .Have the "techies" use the PC directly.
 Only use software that lets the secretary or proofreader listen to what was dictated.
 Have a pilot group of lawyers who already use dictating machines enter text using a dictating
machine .
 When you roll the system out to the firm, be ready with personal trainers and floor support.
 Require everything to be proofread carefully.
Authors: Khaled M. Alhawiti
Abstract:
This research study aims to present a retrospective study about speech recognition systems and artificial
intelligence. Speech recognition has become one of the widely used technologies, as it offers great opportunity to
interact and communicate with automated machines. Precisely, it can be affirmed that speech recognition
facilitates its users and helps them to perform their daily routine tasks, in a more convenient and effective manner.
This research intends to present the illustration of recent technological advancements, which are associated with
artificial intelligence. Recent researches have revealed the fact that speech recognition is found to be the utmost
issue, which affects the decoding of speech. In order to overcome these issues, different statistical models were
developed by the researchers. Some of the most prominent statistical models include acoustic model (AM),
language model (LM), lexicon model, and hidden Markov models (HMM). The research will help in
understanding all of these statistical models of speech recognition. Researchers have also formulated different
decoding methods, which are being utilized for realistic decoding tasks and constrained artificial languages. These
decoding methods include pattern recognition, acoustic phonetic, and artificial intelligence. It has been recognized
that artificial intelligence is the most efficient and reliable methods, which are being used in speech recognition.
Paper-1
Paper-2
Authors: Kyu Jeong Han, ASAPP
Abstract:
It was Stanley Kubrick that first pictured the aspiration of mankind to create artificial intelligences that can
communicate with humans in a movie titled "2001: Space Odyssey", but even before this 1968 motion
picture, we, human beings, had made incessant effort to develop human-like intellectual systems. The
pursuit of human-level automatic speech recognition (ASR) technology, along the same line, has its own
history that has stimulated a significant deal of technological advances throughout the journey. This Industry
Expert talk reviews the recent history of the Odyssey by the speech signal processing/machine learning
communities to achieve or even exceed the human parity in ASR systems, focusing on the breakthroughs
made in the deep learning era in the context of Switchboard and Libri Speech, the two most widely-adopted
standard benchmark datasets.
Paper-3
Authors:
Song Li, Mustafa Oskin Yerebakan, Yue Luo, Ben Amaba, William Swope
Abstract:
Abstract Voice recognition has become an integral part of our lives, commonly used in call centers and as
part of virtual assistants. However, voice recognition is increasingly applied to more industrial uses. Each of
these use cases has unique characteristics that may impact the effectiveness of voice recognition, which
could impact industrial productivity, performance, or even safety. One of the most prominent among them is
the unique background noises that are dominant in each industry. The existence of different machinery and
different work layouts are primary contributors to this. Another important characteristic is the type of
communication that is present in these settings. Daily communication often involves longer sentences
uttered under relatively silent conditions, whereas communication in industrial settings is often short and
conducted in loud conditions.
Paper-4
Authors:
Hui Liu, in Robot Systems for Rail Transit Applications, 2020
Abstract:
Speech recognition is the process of converting human sound signals into words or instructions. Speech
recognition is based on speech. It is an important research direction of speech signal processing and a branch
of pattern recognition. The research of speech recognition involves many subject areas such as computer
technology, artificial intelligence, digital signal processing, pattern recognition, acoustics, linguistics, and
cognitive science. It is a multidisciplinary comprehensive research field.
Future Scope
• Speech recognition software can translate spoken words into text using closed captions to enable a person
with hearing loss to understand what others are saying. Speech recognition can also enable those with
limited use of their hands to work with computers, using voice commands instead of typing. Court
reporting.
• Professional speech recognition technology can reduce repetitive tasks so professionals can focus on other
things like their clients, patients and other aspects of the business. It also allows businesses to save money
by automating processes and doing administrative tasks more quickly.
Conclusion
• This paper presents the Speech Recognition in Artificial intelligence systems and it is important to
consider the environment in which the speech recognition system has to work .
• The grammar used by the speaker and accepted by the system, noise level, noise type, position of
the microphone, and speed and manner of the user’s speech are some factors that may affect the
quality of speech recognition
THANK YOU

AI for voice recognition.pptx

  • 1.
    A Seminar ON “AI For SpeechRecognition” Submitted in partial fulfilment for the award of Degree of Bachelor of Technology Session: 2022-23 SUBMITTED TO: Dr. Garima Mathur (Head & Professor) Guided By: Dr. Payal Bansal (Associate Professor) SUBMITTED BY: Akash Singh(PCE19EC005) Department Of Electronics and Communication Engineering Poornima College of Engineering, Jaipur
  • 2.
    CONTENT LIST  ArtificialIntelligence (or AI )  Speech Recognition  Speech Recognition in Cell Phones  Performance of speech recognition systems  HMM - based Speech Recognition  DTW - based Speech Recognition  Applications of Speech Recognition  Failures of Speech Recognition  Supporting Speech Recognition  Conclusion  Thank You
  • 3.
    Artificial Intelligence (orAI ) The study and design of intelligent agents & also used to describe a property of machines or programs Among researchers hope machines will exhibit are reasoning, knowledge, planning, learning, communication, perception and the ability to move and manipulate.
  • 4.
    Applications of AI Pattern Recognition  Hand Recognition  Speech Recognition  Natural Language  Processing Face Recognition  Artificial Creativity  Non linear controls and Robotics
  • 5.
    Speech Recognition  Speechrecognition converts spoken words to machine-readable input. It is also called Voice Recognition. Speech recognition include- • voice dialing • content-based spoken audio search • speech-to-text processing Audio visual Speech Recognition is also present in which it takes lip reading also apart from speech recognition.
  • 6.
    Speech Recognition inCell phones  Callers words are captured and digitized by speech- recognition system.  Digitized voice is split into individual frequency components, called spectral representations .  The components are translated into phonemes .  Complex models and algorithms determine a likely translation
  • 7.
    Performance of speechrecognition systems  It is usually specified in terms of accuracy and speed. Accuracy may be measured in terms of performance accuracy which is usually rated with word error rate , whereas speed is measured with the real time factor.  Dictation machines can achieve very high performance in controlled conditions and require only a short period of training. Optimal conditions usually assume that users- • Have speech characteristics which match the training data. • Can achieve proper speaker adaption. • Work in clean and no noise environment.
  • 8.
    There are 2models on statistically- based Speech Recognition  Hidden Markov Model (HMM model)  Dynamic Time Wrapping (DTW model)
  • 9.
    HMM - basedSpeech Recognition These are statistical models which output a sequence of symbols or quantities. Two reasons why HMMs are mainly used and popular-  Speech signal could be viewed as a piecewise stationary signal.  They can be trained automatically , simple and computationally feasible to use
  • 10.
    DTW - basedSpeech Recognition  Dynamic time warping is an algorithm for measuring similarity between two sequences which may vary in time or speed. It is a historical approach. Similarities between speaking patterns would be detected. DTW has been applied to video, audio, and graphics -- indeed, any data which can be turned into a linear representation can be analyzed with DTW .  This sequence technique is also used in HMMs model.
  • 11.
    Applications of SpeechRecognition  Health Care – In this even in the wake of Speech recognition technologies MT haven’t become obsolute .  Military- High-performance fighter aircraft-Speech recognizers have been operated successfully Some important conclusions from the work are as follows: 1. Speech recognition has definite potential for reducing pilot workload, but this potential was not realized consistently. 2. 2. Achievement of very high recognition accuracy (95% or more) was the most critical factor for making the speech recognition system useful - with lower recognition rates, pilots would not use the system. 3. 3. More natural vocabulary and grammar, and shorter training times would be useful, but only if very high recognition rates could be maintained.
  • 12.
     Helicopters –As in fighter applications overriding issue for voice in helicopters is the impact on pilot effectiveness.  Battle Management – Speech recognition equipment was tested in conjunction with an integrated information display for naval battle management applications.  Telephony and other domains – ASR in the field of computer gaming and simulation is becoming more widespread .  Disabled people – These people are another part of population that benefit from speech recognition programs.
  • 13.
    Failures of SpeechRecognition  The computer has trouble with "sound-alike" errors. It's hard to get mad at the computer for not recognizing mumbling. But it can be frustrating when you think you are speaking clearly, and it just isn't good enough. For example, when I said: I sure look forward to seeing you The computer heard: Assure look forward to seen in you When I repeated the same words with better enunciation, the computer got it right .  Using Laptops On the Roads.
  • 14.
    Supporting Speech Recognition Tobe successful, most firms should-  Set up a pilot group of patient lawyers to try it out .Have the "techies" use the PC directly.  Only use software that lets the secretary or proofreader listen to what was dictated.  Have a pilot group of lawyers who already use dictating machines enter text using a dictating machine .  When you roll the system out to the firm, be ready with personal trainers and floor support.  Require everything to be proofread carefully.
  • 15.
    Authors: Khaled M.Alhawiti Abstract: This research study aims to present a retrospective study about speech recognition systems and artificial intelligence. Speech recognition has become one of the widely used technologies, as it offers great opportunity to interact and communicate with automated machines. Precisely, it can be affirmed that speech recognition facilitates its users and helps them to perform their daily routine tasks, in a more convenient and effective manner. This research intends to present the illustration of recent technological advancements, which are associated with artificial intelligence. Recent researches have revealed the fact that speech recognition is found to be the utmost issue, which affects the decoding of speech. In order to overcome these issues, different statistical models were developed by the researchers. Some of the most prominent statistical models include acoustic model (AM), language model (LM), lexicon model, and hidden Markov models (HMM). The research will help in understanding all of these statistical models of speech recognition. Researchers have also formulated different decoding methods, which are being utilized for realistic decoding tasks and constrained artificial languages. These decoding methods include pattern recognition, acoustic phonetic, and artificial intelligence. It has been recognized that artificial intelligence is the most efficient and reliable methods, which are being used in speech recognition. Paper-1
  • 16.
    Paper-2 Authors: Kyu JeongHan, ASAPP Abstract: It was Stanley Kubrick that first pictured the aspiration of mankind to create artificial intelligences that can communicate with humans in a movie titled "2001: Space Odyssey", but even before this 1968 motion picture, we, human beings, had made incessant effort to develop human-like intellectual systems. The pursuit of human-level automatic speech recognition (ASR) technology, along the same line, has its own history that has stimulated a significant deal of technological advances throughout the journey. This Industry Expert talk reviews the recent history of the Odyssey by the speech signal processing/machine learning communities to achieve or even exceed the human parity in ASR systems, focusing on the breakthroughs made in the deep learning era in the context of Switchboard and Libri Speech, the two most widely-adopted standard benchmark datasets.
  • 17.
    Paper-3 Authors: Song Li, MustafaOskin Yerebakan, Yue Luo, Ben Amaba, William Swope Abstract: Abstract Voice recognition has become an integral part of our lives, commonly used in call centers and as part of virtual assistants. However, voice recognition is increasingly applied to more industrial uses. Each of these use cases has unique characteristics that may impact the effectiveness of voice recognition, which could impact industrial productivity, performance, or even safety. One of the most prominent among them is the unique background noises that are dominant in each industry. The existence of different machinery and different work layouts are primary contributors to this. Another important characteristic is the type of communication that is present in these settings. Daily communication often involves longer sentences uttered under relatively silent conditions, whereas communication in industrial settings is often short and conducted in loud conditions.
  • 18.
    Paper-4 Authors: Hui Liu, inRobot Systems for Rail Transit Applications, 2020 Abstract: Speech recognition is the process of converting human sound signals into words or instructions. Speech recognition is based on speech. It is an important research direction of speech signal processing and a branch of pattern recognition. The research of speech recognition involves many subject areas such as computer technology, artificial intelligence, digital signal processing, pattern recognition, acoustics, linguistics, and cognitive science. It is a multidisciplinary comprehensive research field.
  • 19.
    Future Scope • Speechrecognition software can translate spoken words into text using closed captions to enable a person with hearing loss to understand what others are saying. Speech recognition can also enable those with limited use of their hands to work with computers, using voice commands instead of typing. Court reporting. • Professional speech recognition technology can reduce repetitive tasks so professionals can focus on other things like their clients, patients and other aspects of the business. It also allows businesses to save money by automating processes and doing administrative tasks more quickly.
  • 20.
    Conclusion • This paperpresents the Speech Recognition in Artificial intelligence systems and it is important to consider the environment in which the speech recognition system has to work . • The grammar used by the speaker and accepted by the system, noise level, noise type, position of the microphone, and speed and manner of the user’s speech are some factors that may affect the quality of speech recognition
  • 21.