SlideShare a Scribd company logo
1 of 3
Download to read offline
Speech recognition - how does it work?
The first device for speech recognition arrived in 1952, and it could understand the numbers
spoken by a person. 40 years later, the first commercial programs that recognize human
speech were introduced. They were intended for people who, due to physiological
characteristics, could not type manually. Now the speech recognition function is available in
almost any smartphone; it allows us to interact with voice applications, making our lives
easier and more relaxed. How speech recognition works-this is in today's issue.
The applications most certainly associated with the term "voice search" are based on the use
of speech recognition systems and frequent speech synthesis to return search results
automatically. Voice search is conducted in the following ways:
perform a search for companies by name or category;
perform a search for a person by list;
search for information such as finances, weather, news, congestion, traffic, or information
about movie theaters (this is frequently used to manage multi-level voice menus);
How voice recognition is used in real life
If you say a voice request, for example, the address of the destination, the smartphone will
not hear the street and the house number, but a sound signal in which the sounds smoothly
flow into each other, without clear boundaries. It is worth noting that the same phrase, uttered
by different people in different situations, gives completely different signals to each other.
After receiving a voice request, that is recorded by the smartphone and sent to the servers.
The level of interference is determined, and the noise is cleared, and the useful signal is
separated. Then the record is divided into small fragments (frames), for example, 25
milliseconds in length with a step of 10 milliseconds, that is, overlap. Thus, one second of
speech produces a hundred frames.
Machine Learning processing
First, each frame is transmitted through the acoustic model. Machine learning algorithm
defines spoken word variants and context. The correctness of the results straight depends on
the completeness of the phonetic alphabet of the system. For each sound, a complex
statistical model is initially constructed that describes the utterance of this sound in speech.
The recognition system matches the incoming speech signal with phonemes, and then
collects words from them. Each frame is mapped not to a single phoneme, but to several that
match with varying degrees of probability. Besides, the system takes into account the
probability of transitions, that is, determines which frames can follow a particular phoneme.
For this purpose, data on pronunciation, morphology, and semantics used. Thus, the system
selects variants of words, which are then analyzed for forms, parts of speech, and possible
statistical relationships between them.
Next, a language model enters the process, with which the system determines the probable
word order and, if necessary, restores unrecognized words in meaning based on the context.
As a result, the received information is sent to the central unit of the recognition system - the
decoder. This software component combines data from acoustic and language models and,
based on their combination, produces the final result in the form of the most likely sequence
of words.
Integrating speech recognition and voice commands into a website
If you want to integrate speech recognition to your website, you can check for some tutorials
on the internet, which uses the browser Speech Recognition API. Or even easier is to install
the speech recognition tool for a website Voxpow we found.
It is the first online tool for adding voice commands to a website and controlling everything
from a single point. It is a tool that allows you to use voice power quite easily and for free.
Big players in Speech Recognition world
Google
The well-known IT Corporation offers to test its Google Cloud Platform product online.
Anyone can try out the service for free. The product itself is convenient and clear to use.
Pluses:
support for more than 80 languages;
fast processing of names entities;
high-quality recognition in conditions of poor communication and in the presence of
extraneous sounds.
Minuses:
there are difficulties in recognizing messages with accents and poor pronunciation, which
makes the system difficult to use by anyone other than native speakers;
lack of clear technical support for the service.
Yandex
Speech recognition from Yandex is available in several ways:
via cloud service;
library for access from mobile applications;
JavaScript API
Pluses:
easy to use and configure;
good recognition of the text in Russian language;
the system gives out several variants of answers and through neural networks tries to find
the most similar to the truth option.
Minuses:
some words may not be defined correctly during streaming.
Azure
The Azure system was developed by Microsoft. Against the background of analogues, it
stands out strongly due to the price. But, be prepared to face some difficulties.
Pluses:
relative to other services, Azure processes messages very quickly in real time.
Minuses:
the system is very sensitive to accent, hardly recognizes speech from non-native speakers;
the system works only in English.
Overview
Thanks to machine learning, systems are resistant to noise and can recognize speech with
an accent. The accuracy of modern speech recognition systems exceeds 90 percent. We are
very close to the times that speech recognition technologies will be used in every aspect of
our lives.

More Related Content

What's hot

Artificial Intelligence for Speech Recognition
Artificial Intelligence for Speech RecognitionArtificial Intelligence for Speech Recognition
Artificial Intelligence for Speech RecognitionRHIMRJ Journal
 
Voice recognition security systems
Voice recognition security systemsVoice recognition security systems
Voice recognition security systemsSandeep Kumar
 
Voice/Speech recognition in mobile devices
Voice/Speech recognition in mobile devicesVoice/Speech recognition in mobile devices
Voice/Speech recognition in mobile devicesHarshad Karmarkar
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by IqbalIqbal
 
Text to speech converter in C#.NET
Text to speech converter in C#.NETText to speech converter in C#.NET
Text to speech converter in C#.NETMandeep Cheema
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech RecognitionHugo Moreno
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognitionCharu Joshi
 
Voice Recognition and Natural Language - Dallas TechFest 2016
Voice Recognition and Natural Language - Dallas TechFest 2016Voice Recognition and Natural Language - Dallas TechFest 2016
Voice Recognition and Natural Language - Dallas TechFest 2016Crispin Reedy
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition Goa App
 
Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminarDiptimaya Sarangi
 
Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overviewVarun Jain
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologySrijanKumar18
 
Abstract of speech recognition
Abstract of speech recognitionAbstract of speech recognition
Abstract of speech recognitionVinay Jaisriram
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice RecognitionAmrita More
 

What's hot (19)

Artificial Intelligence for Speech Recognition
Artificial Intelligence for Speech RecognitionArtificial Intelligence for Speech Recognition
Artificial Intelligence for Speech Recognition
 
Voice recognition security systems
Voice recognition security systemsVoice recognition security systems
Voice recognition security systems
 
Voice/Speech recognition in mobile devices
Voice/Speech recognition in mobile devicesVoice/Speech recognition in mobile devices
Voice/Speech recognition in mobile devices
 
Assign
AssignAssign
Assign
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by Iqbal
 
Text to speech converter in C#.NET
Text to speech converter in C#.NETText to speech converter in C#.NET
Text to speech converter in C#.NET
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
Voice Recognition and Natural Language - Dallas TechFest 2016
Voice Recognition and Natural Language - Dallas TechFest 2016Voice Recognition and Natural Language - Dallas TechFest 2016
Voice Recognition and Natural Language - Dallas TechFest 2016
 
project indesh
project indeshproject indesh
project indesh
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition
 
Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminar
 
Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overview
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Speech Recognition System
Speech Recognition SystemSpeech Recognition System
Speech Recognition System
 
Abstract of speech recognition
Abstract of speech recognitionAbstract of speech recognition
Abstract of speech recognition
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 

Similar to Speech recognition - how does it work?

Paper on Speech Recognition
Paper on Speech RecognitionPaper on Speech Recognition
Paper on Speech RecognitionThejus Joby
 
Artificial Intelligence- An Introduction
Artificial Intelligence- An IntroductionArtificial Intelligence- An Introduction
Artificial Intelligence- An Introductionacemindia
 
Artificial Intelligence - An Introduction
Artificial Intelligence - An Introduction Artificial Intelligence - An Introduction
Artificial Intelligence - An Introduction acemindia
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 
Speech recognizers & generators
Speech recognizers & generatorsSpeech recognizers & generators
Speech recognizers & generatorsPaul Kahoro
 
Speech recognition system
Speech recognition systemSpeech recognition system
Speech recognition systemRipal Ranpara
 
Speech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceSpeech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceIlhaan Marwat
 
Advanced Computational Intelligence: An International Journal (ACII)
Advanced Computational Intelligence: An International Journal (ACII)Advanced Computational Intelligence: An International Journal (ACII)
Advanced Computational Intelligence: An International Journal (ACII)aciijournal
 
VOICE COMMAND SYSTEM USING RASPBERRY PI
VOICE COMMAND SYSTEM USING RASPBERRY PIVOICE COMMAND SYSTEM USING RASPBERRY PI
VOICE COMMAND SYSTEM USING RASPBERRY PIaciijournal
 
Voice Command System Using Raspberry PI
Voice Command System Using Raspberry PIVoice Command System Using Raspberry PI
Voice Command System Using Raspberry PIaciijournal
 
Developing a hands-free interface to operate a Computer using voice command
Developing a hands-free interface to operate a Computer using voice commandDeveloping a hands-free interface to operate a Computer using voice command
Developing a hands-free interface to operate a Computer using voice commandMohammad Liton Hossain
 
Artificial intelligence - research areas
Artificial intelligence - research areasArtificial intelligence - research areas
Artificial intelligence - research areasLearnbay Datascience
 
Presentation 204 lisa bruening aac in times of change
Presentation 204  lisa bruening aac in times of changePresentation 204  lisa bruening aac in times of change
Presentation 204 lisa bruening aac in times of changeThe ALS Association
 
Voice Recognition System using Template Matching
Voice Recognition System using Template MatchingVoice Recognition System using Template Matching
Voice Recognition System using Template MatchingIJORCS
 

Similar to Speech recognition - how does it work? (20)

Paper on Speech Recognition
Paper on Speech RecognitionPaper on Speech Recognition
Paper on Speech Recognition
 
Artificial Intelligence- An Introduction
Artificial Intelligence- An IntroductionArtificial Intelligence- An Introduction
Artificial Intelligence- An Introduction
 
Artificial Intelligence - An Introduction
Artificial Intelligence - An Introduction Artificial Intelligence - An Introduction
Artificial Intelligence - An Introduction
 
Seminar
SeminarSeminar
Seminar
 
30
3030
30
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Speech recognizers & generators
Speech recognizers & generatorsSpeech recognizers & generators
Speech recognizers & generators
 
Speech recognition system
Speech recognition systemSpeech recognition system
Speech recognition system
 
Speech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceSpeech Recognition in Artificail Inteligence
Speech Recognition in Artificail Inteligence
 
Advanced Computational Intelligence: An International Journal (ACII)
Advanced Computational Intelligence: An International Journal (ACII)Advanced Computational Intelligence: An International Journal (ACII)
Advanced Computational Intelligence: An International Journal (ACII)
 
VOICE COMMAND SYSTEM USING RASPBERRY PI
VOICE COMMAND SYSTEM USING RASPBERRY PIVOICE COMMAND SYSTEM USING RASPBERRY PI
VOICE COMMAND SYSTEM USING RASPBERRY PI
 
Voice Command System Using Raspberry PI
Voice Command System Using Raspberry PIVoice Command System Using Raspberry PI
Voice Command System Using Raspberry PI
 
voice browser
voice browservoice browser
voice browser
 
VOICE RECOGNITION SYSTEM
VOICE RECOGNITION SYSTEMVOICE RECOGNITION SYSTEM
VOICE RECOGNITION SYSTEM
 
Developing a hands-free interface to operate a Computer using voice command
Developing a hands-free interface to operate a Computer using voice commandDeveloping a hands-free interface to operate a Computer using voice command
Developing a hands-free interface to operate a Computer using voice command
 
Artificial intelligence - research areas
Artificial intelligence - research areasArtificial intelligence - research areas
Artificial intelligence - research areas
 
Speech Analysis
Speech AnalysisSpeech Analysis
Speech Analysis
 
Presentation 204 lisa bruening aac in times of change
Presentation 204  lisa bruening aac in times of changePresentation 204  lisa bruening aac in times of change
Presentation 204 lisa bruening aac in times of change
 
Voice Recognition System using Template Matching
Voice Recognition System using Template MatchingVoice Recognition System using Template Matching
Voice Recognition System using Template Matching
 
Bt35408413
Bt35408413Bt35408413
Bt35408413
 

Speech recognition - how does it work?

  • 1. Speech recognition - how does it work? The first device for speech recognition arrived in 1952, and it could understand the numbers spoken by a person. 40 years later, the first commercial programs that recognize human speech were introduced. They were intended for people who, due to physiological characteristics, could not type manually. Now the speech recognition function is available in almost any smartphone; it allows us to interact with voice applications, making our lives easier and more relaxed. How speech recognition works-this is in today's issue. The applications most certainly associated with the term "voice search" are based on the use of speech recognition systems and frequent speech synthesis to return search results automatically. Voice search is conducted in the following ways: perform a search for companies by name or category; perform a search for a person by list; search for information such as finances, weather, news, congestion, traffic, or information about movie theaters (this is frequently used to manage multi-level voice menus); How voice recognition is used in real life If you say a voice request, for example, the address of the destination, the smartphone will not hear the street and the house number, but a sound signal in which the sounds smoothly flow into each other, without clear boundaries. It is worth noting that the same phrase, uttered by different people in different situations, gives completely different signals to each other. After receiving a voice request, that is recorded by the smartphone and sent to the servers. The level of interference is determined, and the noise is cleared, and the useful signal is separated. Then the record is divided into small fragments (frames), for example, 25 milliseconds in length with a step of 10 milliseconds, that is, overlap. Thus, one second of speech produces a hundred frames. Machine Learning processing First, each frame is transmitted through the acoustic model. Machine learning algorithm defines spoken word variants and context. The correctness of the results straight depends on the completeness of the phonetic alphabet of the system. For each sound, a complex statistical model is initially constructed that describes the utterance of this sound in speech. The recognition system matches the incoming speech signal with phonemes, and then collects words from them. Each frame is mapped not to a single phoneme, but to several that match with varying degrees of probability. Besides, the system takes into account the probability of transitions, that is, determines which frames can follow a particular phoneme. For this purpose, data on pronunciation, morphology, and semantics used. Thus, the system selects variants of words, which are then analyzed for forms, parts of speech, and possible statistical relationships between them. Next, a language model enters the process, with which the system determines the probable
  • 2. word order and, if necessary, restores unrecognized words in meaning based on the context. As a result, the received information is sent to the central unit of the recognition system - the decoder. This software component combines data from acoustic and language models and, based on their combination, produces the final result in the form of the most likely sequence of words. Integrating speech recognition and voice commands into a website If you want to integrate speech recognition to your website, you can check for some tutorials on the internet, which uses the browser Speech Recognition API. Or even easier is to install the speech recognition tool for a website Voxpow we found. It is the first online tool for adding voice commands to a website and controlling everything from a single point. It is a tool that allows you to use voice power quite easily and for free. Big players in Speech Recognition world Google The well-known IT Corporation offers to test its Google Cloud Platform product online. Anyone can try out the service for free. The product itself is convenient and clear to use. Pluses: support for more than 80 languages; fast processing of names entities; high-quality recognition in conditions of poor communication and in the presence of extraneous sounds. Minuses: there are difficulties in recognizing messages with accents and poor pronunciation, which makes the system difficult to use by anyone other than native speakers; lack of clear technical support for the service. Yandex Speech recognition from Yandex is available in several ways: via cloud service; library for access from mobile applications; JavaScript API Pluses: easy to use and configure; good recognition of the text in Russian language; the system gives out several variants of answers and through neural networks tries to find the most similar to the truth option. Minuses:
  • 3. some words may not be defined correctly during streaming. Azure The Azure system was developed by Microsoft. Against the background of analogues, it stands out strongly due to the price. But, be prepared to face some difficulties. Pluses: relative to other services, Azure processes messages very quickly in real time. Minuses: the system is very sensitive to accent, hardly recognizes speech from non-native speakers; the system works only in English. Overview Thanks to machine learning, systems are resistant to noise and can recognize speech with an accent. The accuracy of modern speech recognition systems exceeds 90 percent. We are very close to the times that speech recognition technologies will be used in every aspect of our lives.