SlideShare a Scribd company logo
Speech Recognition
Amit Sharma
1310751033
CSE 8th
SPEECH RECOGNITION
A Process that enables the computers to recognize
and translate spoken language into text. It is also
known as "automatic speech recognition" (ASR),
"computer speech recognition", or just "speech to text"
(STT).
APPLICATIONS
• Medical Transcription
• Military
• Telephone and similar domains
• Serving the disabled
• Home automation system
• Automobile
• Voice dialing (“Call home” )
• Data entry (“A pin number”)
• Speech to text processing (“word processors, emails”)
RECOGNITION PROCESS
Voice Input Analog to Digital Acoustic Model
Language Model
Out Speech EngineFeedback
HOW DO HUMANS DO IT ?
Articulation produces sound
waves which the ear conveys
to the brain for processing
HOW MIGHT COMPUTERS DO IT ?
Acoustic waveform Acoustic signal
Speech recognition
• Digitization
• Acoustic analysis of the
speech signal
• Linguistic interpretation
FLOW SUMMERY OF RECOGNITION
PROCESS
 User Input:
System catches users’ voice in the form of
analog acoustic signal.
 Digitization:
Digitize the analog signal.
 Phonetic Breakdown:
Breaking signals into phenome.
FLOW SUMMERY OF RECOGNITION
PROCESS
 Statistical Modeling:
Mapping phenomes to their phonetic
representation using statistics model.
 Matching:
According to Grammar, phonetic representation and
Dictionary, the system returns a word plus a confidence
score)
TYPES OF SPEECH RECOGNITION
• SPEAKER INDEPANDENT:
Recognize speech of a large group of people
• SPEAKER DEPANDENT:
Recognize speech patterns from only person
• SPEARKER ADAPTIVE:
System usually begins with a speaker
independent model and adjust these models more
closely to each individual during a brief training period
Approaches
to SR
Statistics
Based
Template
Based
Template-based approach
• Store examples of units (words, phenomes),
then find the example that most closely fits the
input
• Just a complex similarity matching problem
• OK for discrete utterances, and single user
Template-based approach
• Hard to distinguished very similar templates
• Quickly degrades when input differs from
template
Statistics based approach
• Collects a large corpus of transcribed speech
recording
• Train the computer to learn the correspondences at
different possibilities(Machine Learning)
• At run time, apply the statistical processes to search
through the space of all possible solutions, and pick
the statistically most likely one
What’s Hard About That ?
• Digitization:
Analog signals into Digital representation
• Signal Processing:
Separating speech from background noise
• Phonetics:
Variability in human speech
• Channel Variability:
The quality and position of microphone and background
environment will affect the output
SPEECH RECOGNITION THROUGH THE
DECADES
- 1950-60s (Baby-Talk)
• ‘They’ first focus on NUMBERS
• Recognize only DIGITS
• 1962, IBM developed ‘SHOEBOX’ which can recognize 16 words
spoken in English
SPEECH RECOGNITION THROUGH THE
DECADES
- 1970s (SR Takes Off)
• U.S. DoD’s DARPA initiate a research program called Speech
Understanding Research Program.
• Code Name was ‘HARPY’ which can understand 1101 words.
• First commercial speech recognition company, Threshold
Technology was setup, as well as Bell Laboratories' introduction of
a system that could interpret multiple people's voices.
SPEECH RECOGNITION THROUGH THE
DECADES
- 1980s (SR Turns Toward Prediction)
• SR vocabulary jumped from about a few hundred words to several
thousand words
• One major reason was a new statistical method known as the hidden
Markov model.
• Rather than simply using templates for words and looking for sound
patterns, HMM considered the probability of unknown sounds' being
words.
• Programs took discrete dictation, so you had … to … pause … after …
each … and … every … word.
SPEECH RECOGNITION THROUGH THE
DECADES
⁻ 1990s (Automatic Speech Recognition)
• In the '90s, computers with faster processors finally
arrived, and speech recognition softwares became
viable for ordinary people.
• Dragons’ Naturally Speaking arrived. The application
recognized continuous speech, so one could speak, well
naturally, at about 100 words per minute. However,
about 45 minutes training was required by the user.
SPEECH RECOGNITION THROUGH THE
DECADES
- 2000s
• Topped out 80% accuracy
• 2002, Google Voice Search was released, that allows users to
use Google Search by speaking on a mobile phone or computer
• 2011, Apple’s Siri was released. Its a built-in "intelligent assistant" that
enables Apple user’s speak voice commands in order to operate the
mobile device and its apps
• 2014, MS Cortana was released. Its also a built-in “intelligent personal
assistant” which can set reminders, recognize natural voice without
the requirement for keyboard input, and answer questions using
information from the Bing search engine.
Artificial Neural Net
0011100101
Artificial Neural Net
0011100101
DO IT YOURSELF
Artificial Neural Net
Sound wave saying ‘Hello’
• But we aren’t quite there yet.
• The big problem is that speech varies in speed
• One person might say “hello!” very quickly and another
person might say “heeeelllllllllllllooooo!” very slowly,
producing a much longer sound file with much more
data. Both sound files should be recognized as exactly
the same text — “hello!”
• Automatically aligning audio files of various lengths to a
fixed-length piece of text turns out to be pretty hard
• To work around this, we have to use some special tricks
and extra processing in addition to a deep neural
network. Let’s see how it works!
Artificial Neural Net
- The first step in speech recognition is obvious —
we need to feed sound waves into a computer.
- But sound is transmitted as waves. How do we turn
sound waves into numbers?
Turning Sounds into Bits
A waveform of saying “Hello”
Let’s zoom in on one tiny part of the sound wave and
take a look:
To turn this sound wave into numbers, we just record
of the height of the wave at equally-spaced points:
• This is called sampling.
• We are taking a reading thousands of times a second
and recording a number representing the height of the
sound wave at that point in time.
• Sampled at 16Khz (16,000 samples/sec).
• Lets sample our “Hello” sound wave 16,000 times per
second. Here’s the first 100 samples:
Each number represents the amplitude of the sound wave at 1/16000th of a second intervals
DIGITAL SAMPLING
A Quick Sidebar
- Loosing our data while sampling, due to the gaps?
Pre-processing our Sampled Sound Data
- We now have an array of numbers with each
number representing the sound wave’s amplitude
at 1/16,000th of a second intervals.
- some pre-processing is done on the audio data,
instead of feeding these numbers right into a
neural network.
- Let’s start by grouping our sampled audio into 20-
millisecond-long chunks.
• Here’s our first 20 milliseconds of audio (i.e., our first 320
samples):
• Plotting those numbers as a simple line graph gives us a
rough approximation of the original sound wave for that
20 millisecond period of time:
• To make this data easier for a neural network to process,
we are going to break apart this complex sound wave
into it’s component parts.
• We’ll break out the low-pitched parts, the next-lowest-
pitched-parts, and so on. Then by adding up how much
energy is in each of those frequency bands (from low to
high), we create a fingerprint for this audio snippet.
• We do this using a mathematic operation called
a Fourier transform.
• It breaks apart the complex sound wave into the simple
sound waves that make it up. Once we have those
individual sound waves, we add up how much energy is
contained in each one.
• Each number below represents how much energy was in
each 50hz band of our 20 millisecond audio clip:
• Lot easier on a chart:
• If we repeat this process on every 20 millisecond chunk
of audio, we end up with a spectrogram (each column
from left-to-right is one 20ms chunk):
The full spectrogram of the “hello” sound clip
Recognizing Characters from Short Sounds
• Now that we have our audio in a format that’s easy to
process, we will feed it into a deep neural network.
• The input to the neural network will be 20 millisecond
audio chunks.
• For each little audio slice, it will try to figure out
the letter that corresponds the sound currently being
spoken.
• After we run our entire audio clip through the neural
network (one chunk at a time), we’ll end up with a
mapping of each audio chunk to the letters most likely
spoken during that chunk.
• Here’s what that mapping looks like saying “Hello”:
• Our neural net is predicting that one likely thing that were
said was “HHHEE_LL_LLLOOO”. But it also thinks that it
was possible that it could be “HHHUU_LL_LLLOOO” or
even “AAAUU_LL_LLLOOO”.
• We have some steps we follow to clean up this output.
First, we’ll replace any repeated characters a single
character:
o HHHEE_LL_LLLOOO becomes HE_L_LO
o HHHUU_LL_LLLOOO becomes HU_L_LO
o AAAUU_LL_LLLOOO becomes AU_L_LO
• Then we’ll remove any blanks:
o HE_L_LO becomes HELLO
o HU_L_LO becomes HULLO
o AU_L_LO becomes AULLO
• That leaves us with three possible transcriptions —
 “Hello”, “Hullo” and “Aullo”.
• The trick is to combine these pronunciation-based
predictions with likelihood scores based on large
database of written text.
• Of our possible transcriptions “Hello”, “Hullo” and “Aullo”,
obviously “Hello” will appear more frequently in a
database of text and thus is probably correct. So we’ll
pick “Hello” as our final transcription instead of the
others. Done!
What the Future Holds
• Voice will be a primary interface for the connected home, providing a
natural means to communicate with alarm systems, lights, kitchen
appliances, sound systems and more, as users go about their day-
to-day lives.
• More and more major cars on the market will adopt intelligent, voice-
driven systems for entertainment and location-based search,
keeping drivers’ and passengers’ eyes and hands free.
• Small-screened and screen less wearables will continue their
upward climb in popularity.
• Voice-controlled devices will also dominate workplaces that require
hands-free mobility, such as hospitals, warehouses, laboratories and
production plants.
• Intelligent virtual assistants built into mobile operating systems keep
getting better.
[~] $ Questions_?

More Related Content

What's hot

Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition
Goa App
 
SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK
Kamonasish Hore
 
speech processing and recognition basic in data mining
speech processing and recognition basic in  data miningspeech processing and recognition basic in  data mining
speech processing and recognition basic in data mining
Jimit Rupani
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
International Islamic University
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
SrijanKumar18
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
Alok Tiwari
 
Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overview
Varun Jain
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by Iqbal
Iqbal
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech RecognitionHugo Moreno
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice RecognitionAmrita More
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
ananth
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
Manthan Gandhi
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technology
Kalluri Madhuri
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
Richie
 
Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminarDiptimaya Sarangi
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
Ankit Gujrati
 
Speech recognition An overview
Speech recognition An overviewSpeech recognition An overview
Speech recognition An overview
sajanazoya
 
Artificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemArtificial intelligence Speech recognition system
Artificial intelligence Speech recognition system
REHMAT ULLAH
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversionankit_saluja
 

What's hot (20)

Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition
 
SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK SPEECH RECOGNITION USING NEURAL NETWORK
SPEECH RECOGNITION USING NEURAL NETWORK
 
speech processing and recognition basic in data mining
speech processing and recognition basic in  data miningspeech processing and recognition basic in  data mining
speech processing and recognition basic in data mining
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
 
Speech recognition an overview
Speech recognition   an overviewSpeech recognition   an overview
Speech recognition an overview
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by Iqbal
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Voice Recognition
Voice RecognitionVoice Recognition
Voice Recognition
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
 
Speech processing
Speech processingSpeech processing
Speech processing
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technology
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Speech recognition system seminar
Speech recognition system seminarSpeech recognition system seminar
Speech recognition system seminar
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
Speech recognition An overview
Speech recognition An overviewSpeech recognition An overview
Speech recognition An overview
 
Artificial intelligence Speech recognition system
Artificial intelligence Speech recognition systemArtificial intelligence Speech recognition system
Artificial intelligence Speech recognition system
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 

Similar to Speech Recognition System

Speech recognizers & generators
Speech recognizers & generatorsSpeech recognizers & generators
Speech recognizers & generators
Paul Kahoro
 
Speech Analysis
Speech AnalysisSpeech Analysis
Speech Analysis
Mohamed Essam
 
Assign
AssignAssign
Digital speech within 125 hz bandwidth (DS-125)
Digital speech within 125 hz bandwidth (DS-125)Digital speech within 125 hz bandwidth (DS-125)
Digital speech within 125 hz bandwidth (DS-125)
Michael Lebo
 
How speech reorganization works
How speech reorganization worksHow speech reorganization works
How speech reorganization worksMuhammad Taqi
 
Silent sound interface
Silent sound interfaceSilent sound interface
Silent sound interfaceJeevitha Reddy
 
Digital speech within 125 hz bandwidth
Digital speech within 125 hz bandwidthDigital speech within 125 hz bandwidth
Digital speech within 125 hz bandwidth
AlexLuther3
 
Reverb w5 imp_2
Reverb w5 imp_2Reverb w5 imp_2
Reverb w5 imp_2
Jan Zurcher
 
Silent sound technologyrevathippt
Silent sound technologyrevathipptSilent sound technologyrevathippt
Silent sound technologyrevathippt
revathiyadavb
 
Ig2task1worksheetelliot 140511141816-phpapp02
Ig2task1worksheetelliot 140511141816-phpapp02Ig2task1worksheetelliot 140511141816-phpapp02
Ig2task1worksheetelliot 140511141816-phpapp02
ElliotBlack
 
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
Simplilearn
 
Dante Audio Networking Fundamentals
Dante Audio Networking FundamentalsDante Audio Networking Fundamentals
Dante Audio Networking Fundamentals
rAVe [PUBS]
 
COMP 4010 Lecture5 VR Audio and Tracking
COMP 4010 Lecture5 VR Audio and TrackingCOMP 4010 Lecture5 VR Audio and Tracking
COMP 4010 Lecture5 VR Audio and Tracking
Mark Billinghurst
 
Ece speech-recognition-report
Ece speech-recognition-reportEce speech-recognition-report
Ece speech-recognition-report
Anakali Mahesh
 
Artificial Intelligence - An Introduction
Artificial Intelligence - An Introduction Artificial Intelligence - An Introduction
Artificial Intelligence - An Introduction
acemindia
 

Similar to Speech Recognition System (20)

Speech recognizers & generators
Speech recognizers & generatorsSpeech recognizers & generators
Speech recognizers & generators
 
Speech Analysis
Speech AnalysisSpeech Analysis
Speech Analysis
 
Assign
AssignAssign
Assign
 
Digital speech within 125 hz bandwidth (DS-125)
Digital speech within 125 hz bandwidth (DS-125)Digital speech within 125 hz bandwidth (DS-125)
Digital speech within 125 hz bandwidth (DS-125)
 
How speech reorganization works
How speech reorganization worksHow speech reorganization works
How speech reorganization works
 
Silent sound interface
Silent sound interfaceSilent sound interface
Silent sound interface
 
Digital speech within 125 hz bandwidth
Digital speech within 125 hz bandwidthDigital speech within 125 hz bandwidth
Digital speech within 125 hz bandwidth
 
Week two a d conversion
Week two a d conversionWeek two a d conversion
Week two a d conversion
 
Reverb w5 imp_2
Reverb w5 imp_2Reverb w5 imp_2
Reverb w5 imp_2
 
Silent sound technologyrevathippt
Silent sound technologyrevathipptSilent sound technologyrevathippt
Silent sound technologyrevathippt
 
Thingy editedd
Thingy editeddThingy editedd
Thingy editedd
 
Ig2task1worksheetelliot 140511141816-phpapp02
Ig2task1worksheetelliot 140511141816-phpapp02Ig2task1worksheetelliot 140511141816-phpapp02
Ig2task1worksheetelliot 140511141816-phpapp02
 
Automatic Speech Recognion
Automatic Speech RecognionAutomatic Speech Recognion
Automatic Speech Recognion
 
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
What Is A Neural Network? | How Deep Neural Networks Work | Neural Network Tu...
 
Dante Audio Networking Fundamentals
Dante Audio Networking FundamentalsDante Audio Networking Fundamentals
Dante Audio Networking Fundamentals
 
The secerts to great sounding samples.txt
The secerts to great sounding samples.txtThe secerts to great sounding samples.txt
The secerts to great sounding samples.txt
 
The secerts to great sounding samples.txt
The secerts to great sounding samples.txtThe secerts to great sounding samples.txt
The secerts to great sounding samples.txt
 
COMP 4010 Lecture5 VR Audio and Tracking
COMP 4010 Lecture5 VR Audio and TrackingCOMP 4010 Lecture5 VR Audio and Tracking
COMP 4010 Lecture5 VR Audio and Tracking
 
Ece speech-recognition-report
Ece speech-recognition-reportEce speech-recognition-report
Ece speech-recognition-report
 
Artificial Intelligence - An Introduction
Artificial Intelligence - An Introduction Artificial Intelligence - An Introduction
Artificial Intelligence - An Introduction
 

Recently uploaded

Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
Col Mukteshwar Prasad
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
AzmatAli747758
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
Vivekanand Anglo Vedic Academy
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
PedroFerreira53928
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 

Recently uploaded (20)

Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
PART A. Introduction to Costumer Service
PART A. Introduction to Costumer ServicePART A. Introduction to Costumer Service
PART A. Introduction to Costumer Service
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 

Speech Recognition System

  • 2. SPEECH RECOGNITION A Process that enables the computers to recognize and translate spoken language into text. It is also known as "automatic speech recognition" (ASR), "computer speech recognition", or just "speech to text" (STT).
  • 3. APPLICATIONS • Medical Transcription • Military • Telephone and similar domains • Serving the disabled • Home automation system • Automobile • Voice dialing (“Call home” ) • Data entry (“A pin number”) • Speech to text processing (“word processors, emails”)
  • 4. RECOGNITION PROCESS Voice Input Analog to Digital Acoustic Model Language Model Out Speech EngineFeedback
  • 5. HOW DO HUMANS DO IT ? Articulation produces sound waves which the ear conveys to the brain for processing
  • 6. HOW MIGHT COMPUTERS DO IT ? Acoustic waveform Acoustic signal Speech recognition • Digitization • Acoustic analysis of the speech signal • Linguistic interpretation
  • 7.
  • 8. FLOW SUMMERY OF RECOGNITION PROCESS  User Input: System catches users’ voice in the form of analog acoustic signal.  Digitization: Digitize the analog signal.  Phonetic Breakdown: Breaking signals into phenome.
  • 9. FLOW SUMMERY OF RECOGNITION PROCESS  Statistical Modeling: Mapping phenomes to their phonetic representation using statistics model.  Matching: According to Grammar, phonetic representation and Dictionary, the system returns a word plus a confidence score)
  • 10. TYPES OF SPEECH RECOGNITION • SPEAKER INDEPANDENT: Recognize speech of a large group of people • SPEAKER DEPANDENT: Recognize speech patterns from only person • SPEARKER ADAPTIVE: System usually begins with a speaker independent model and adjust these models more closely to each individual during a brief training period
  • 12. Template-based approach • Store examples of units (words, phenomes), then find the example that most closely fits the input • Just a complex similarity matching problem • OK for discrete utterances, and single user
  • 13. Template-based approach • Hard to distinguished very similar templates • Quickly degrades when input differs from template
  • 14. Statistics based approach • Collects a large corpus of transcribed speech recording • Train the computer to learn the correspondences at different possibilities(Machine Learning) • At run time, apply the statistical processes to search through the space of all possible solutions, and pick the statistically most likely one
  • 15. What’s Hard About That ? • Digitization: Analog signals into Digital representation • Signal Processing: Separating speech from background noise • Phonetics: Variability in human speech • Channel Variability: The quality and position of microphone and background environment will affect the output
  • 16. SPEECH RECOGNITION THROUGH THE DECADES - 1950-60s (Baby-Talk) • ‘They’ first focus on NUMBERS • Recognize only DIGITS • 1962, IBM developed ‘SHOEBOX’ which can recognize 16 words spoken in English
  • 17. SPEECH RECOGNITION THROUGH THE DECADES - 1970s (SR Takes Off) • U.S. DoD’s DARPA initiate a research program called Speech Understanding Research Program. • Code Name was ‘HARPY’ which can understand 1101 words. • First commercial speech recognition company, Threshold Technology was setup, as well as Bell Laboratories' introduction of a system that could interpret multiple people's voices.
  • 18. SPEECH RECOGNITION THROUGH THE DECADES - 1980s (SR Turns Toward Prediction) • SR vocabulary jumped from about a few hundred words to several thousand words • One major reason was a new statistical method known as the hidden Markov model. • Rather than simply using templates for words and looking for sound patterns, HMM considered the probability of unknown sounds' being words. • Programs took discrete dictation, so you had … to … pause … after … each … and … every … word.
  • 19. SPEECH RECOGNITION THROUGH THE DECADES ⁻ 1990s (Automatic Speech Recognition) • In the '90s, computers with faster processors finally arrived, and speech recognition softwares became viable for ordinary people. • Dragons’ Naturally Speaking arrived. The application recognized continuous speech, so one could speak, well naturally, at about 100 words per minute. However, about 45 minutes training was required by the user.
  • 20. SPEECH RECOGNITION THROUGH THE DECADES - 2000s • Topped out 80% accuracy • 2002, Google Voice Search was released, that allows users to use Google Search by speaking on a mobile phone or computer • 2011, Apple’s Siri was released. Its a built-in "intelligent assistant" that enables Apple user’s speak voice commands in order to operate the mobile device and its apps • 2014, MS Cortana was released. Its also a built-in “intelligent personal assistant” which can set reminders, recognize natural voice without the requirement for keyboard input, and answer questions using information from the Bing search engine.
  • 23. Artificial Neural Net Sound wave saying ‘Hello’
  • 24. • But we aren’t quite there yet. • The big problem is that speech varies in speed • One person might say “hello!” very quickly and another person might say “heeeelllllllllllllooooo!” very slowly, producing a much longer sound file with much more data. Both sound files should be recognized as exactly the same text — “hello!” • Automatically aligning audio files of various lengths to a fixed-length piece of text turns out to be pretty hard • To work around this, we have to use some special tricks and extra processing in addition to a deep neural network. Let’s see how it works! Artificial Neural Net
  • 25. - The first step in speech recognition is obvious — we need to feed sound waves into a computer. - But sound is transmitted as waves. How do we turn sound waves into numbers? Turning Sounds into Bits
  • 26. A waveform of saying “Hello”
  • 27. Let’s zoom in on one tiny part of the sound wave and take a look:
  • 28. To turn this sound wave into numbers, we just record of the height of the wave at equally-spaced points:
  • 29. • This is called sampling. • We are taking a reading thousands of times a second and recording a number representing the height of the sound wave at that point in time. • Sampled at 16Khz (16,000 samples/sec). • Lets sample our “Hello” sound wave 16,000 times per second. Here’s the first 100 samples: Each number represents the amplitude of the sound wave at 1/16000th of a second intervals
  • 30. DIGITAL SAMPLING A Quick Sidebar - Loosing our data while sampling, due to the gaps?
  • 31. Pre-processing our Sampled Sound Data - We now have an array of numbers with each number representing the sound wave’s amplitude at 1/16,000th of a second intervals. - some pre-processing is done on the audio data, instead of feeding these numbers right into a neural network. - Let’s start by grouping our sampled audio into 20- millisecond-long chunks.
  • 32. • Here’s our first 20 milliseconds of audio (i.e., our first 320 samples):
  • 33. • Plotting those numbers as a simple line graph gives us a rough approximation of the original sound wave for that 20 millisecond period of time:
  • 34. • To make this data easier for a neural network to process, we are going to break apart this complex sound wave into it’s component parts. • We’ll break out the low-pitched parts, the next-lowest- pitched-parts, and so on. Then by adding up how much energy is in each of those frequency bands (from low to high), we create a fingerprint for this audio snippet. • We do this using a mathematic operation called a Fourier transform. • It breaks apart the complex sound wave into the simple sound waves that make it up. Once we have those individual sound waves, we add up how much energy is contained in each one.
  • 35. • Each number below represents how much energy was in each 50hz band of our 20 millisecond audio clip:
  • 36. • Lot easier on a chart:
  • 37. • If we repeat this process on every 20 millisecond chunk of audio, we end up with a spectrogram (each column from left-to-right is one 20ms chunk): The full spectrogram of the “hello” sound clip
  • 38. Recognizing Characters from Short Sounds • Now that we have our audio in a format that’s easy to process, we will feed it into a deep neural network. • The input to the neural network will be 20 millisecond audio chunks. • For each little audio slice, it will try to figure out the letter that corresponds the sound currently being spoken.
  • 39.
  • 40. • After we run our entire audio clip through the neural network (one chunk at a time), we’ll end up with a mapping of each audio chunk to the letters most likely spoken during that chunk. • Here’s what that mapping looks like saying “Hello”:
  • 41.
  • 42. • Our neural net is predicting that one likely thing that were said was “HHHEE_LL_LLLOOO”. But it also thinks that it was possible that it could be “HHHUU_LL_LLLOOO” or even “AAAUU_LL_LLLOOO”. • We have some steps we follow to clean up this output. First, we’ll replace any repeated characters a single character: o HHHEE_LL_LLLOOO becomes HE_L_LO o HHHUU_LL_LLLOOO becomes HU_L_LO o AAAUU_LL_LLLOOO becomes AU_L_LO
  • 43. • Then we’ll remove any blanks: o HE_L_LO becomes HELLO o HU_L_LO becomes HULLO o AU_L_LO becomes AULLO • That leaves us with three possible transcriptions —  “Hello”, “Hullo” and “Aullo”. • The trick is to combine these pronunciation-based predictions with likelihood scores based on large database of written text. • Of our possible transcriptions “Hello”, “Hullo” and “Aullo”, obviously “Hello” will appear more frequently in a database of text and thus is probably correct. So we’ll pick “Hello” as our final transcription instead of the others. Done!
  • 44. What the Future Holds • Voice will be a primary interface for the connected home, providing a natural means to communicate with alarm systems, lights, kitchen appliances, sound systems and more, as users go about their day- to-day lives. • More and more major cars on the market will adopt intelligent, voice- driven systems for entertainment and location-based search, keeping drivers’ and passengers’ eyes and hands free. • Small-screened and screen less wearables will continue their upward climb in popularity. • Voice-controlled devices will also dominate workplaces that require hands-free mobility, such as hospitals, warehouses, laboratories and production plants. • Intelligent virtual assistants built into mobile operating systems keep getting better.