SlideShare a Scribd company logo
What is Auto Speech Recognition (ASR)?
Hello,
how are you?
Voice Speech Waveform Feature from Audio
(e.g., Spectrogram)
Auto ML Text
Automatic Speech Recognition (ASR)
The technology of converting speech to written form (called speech-to-
text) which human can interpret the meaning of text.
1/7
How to Understand Speech: Approach by Educated Human
• Reading Spectrogram
source: Step by step through a spectrogram, https://www.youtube.com/watch?v=lfZ6XSRaRR8 2/7
How to Understand Speech: Approach by AI
• Static ASR system
Sound Waveform
↓
Acoustic Feature Extraction
↓
Acoustic Model
↓
Pronunciation Model
↓
Language Model
↓
Text
• End-to-End Neural ASR System
Sound Waveform
↓
Acoustic Feature Extraction
↓
Acoustic Model with DNN
↓
Language Model with DNN
↓
Text
(Graves, Jaitley, 2014, “Towards End-to-End Speech
Recognition with Recurrent Neural Networks”, ICML)
(Chan et. al., 2016, “Listen, Attend and Spell: A Neural Network
for LargeVocabulary Conversational Speech Recognition”, ICASSP) 3/7
Challenges in a System Using Only Acoustic Model
Word “Probably”
Dictionary Pronunciation pr aa b ax b l iy
Actual pronunciations
(many common ways
to pronounce)
p r aa b iy
p r aw l uh
p r aa l iy
p aa b uh b liy
p ow ih
p aa iy
p r ah b iy
(Preethi Jyothi, 2017, “Automatic Speech Recognition - An Overview”)
I
• probably (p r aa b iy)
• probability (p r aa b iy)
play tennis.
Language Model
4/7
BeyondText – Insights
Insight
• Topic Classification
• Semantic Parsing and Question Answering
• Customer Segmentation/ Prioritization
• Summary Generation
Voice Sound Waveform Feature Auto ML Text Insights!
5/7
Current Challenges in ASR
• Noisy real-life conversion with multiple speakers
• Robustness to variations in ages and accents
• Integration of effort across multiple dialects with transfer learning
• Embedded ASR system locally on mobile devices without internet
connection
• Bad channel conditions (intermittently dropping voice)
(Preethi Jyothi, 2018, “State-of-the-Art in Speech Technologies”)
6/7
Great Study Materials on Speech Recognition atYouTube
Deep Learning Lecture Series
1. “CS231N Winter 2016”
Convolutional Neural Networks for Visual Recognition by Stanford University (Andrej Karpathy ver.), 16 videos
2. “CS231N Spring 2017”
Convolutional Neural Networks for Visual Recognition by Stanford University, 16 videos
3. “CS224N Winter 2017”
Natural Language Processing with Deep Learning by Stanford University, 19 videos
Auto Speech Recognition Sessions
1. Automatic Speech Recognition - An Overview
Presenter is Preethi Jyothi, IIT Bombay in Sep 2017
2. State-of-the-Art in Speech Technologies
Presenter is Preethi Jyothi, IIT Bombay in Jan 2018
3. Lecture 2 | Word Vector Representations: word2vec
Lecture 2 in Natural Language Processing with Deep Learning
4. Step by step through a spectrogram
Lecturer is Andy McMillin, Clinical Associate Professor at Portland State University
7/7

More Related Content

What's hot

Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
Alok Tiwari
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
International Islamic University
 
A seminar report on speech recognition technology
A seminar report on speech recognition technologyA seminar report on speech recognition technology
A seminar report on speech recognition technology
SrijanKumar18
 
Artificial Intelligence for Speech Recognition
Artificial Intelligence for Speech RecognitionArtificial Intelligence for Speech Recognition
Artificial Intelligence for Speech Recognition
RHIMRJ Journal
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
SrijanKumar18
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by Iqbal
Iqbal
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
Yogendra Tamang
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
Charu Joshi
 
Sign Language Recognition based on Hands symbols Classification
Sign Language Recognition based on Hands symbols ClassificationSign Language Recognition based on Hands symbols Classification
Sign Language Recognition based on Hands symbols Classification
Triloki Gupta
 
Sign language recognizer
Sign language recognizerSign language recognizer
Sign language recognizer
Bikash Chandra Karmokar
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extraction
Gabriel Hamilton
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
ananth
 
Speech Recognition Using Python | Edureka
Speech Recognition Using Python | EdurekaSpeech Recognition Using Python | Edureka
Speech Recognition Using Python | Edureka
Edureka!
 
Sign language translator ieee power point
Sign language translator ieee power pointSign language translator ieee power point
Sign language translator ieee power point
Madhuri Yellapu
 
Speech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceSpeech Recognition in Artificail Inteligence
Speech Recognition in Artificail Inteligence
Ilhaan Marwat
 
Natural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionNatural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - Introduction
Aritra Mukherjee
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
Richie
 
Sign Language Recognition System.pptx
Sign Language Recognition System.pptxSign Language Recognition System.pptx
Sign Language Recognition System.pptx
DhruvMittal81
 
Speaker Recognition
Speaker RecognitionSpeaker Recognition
Speaker Recognition
niranjan kumar
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
Yuriy Guts
 

What's hot (20)

Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
 
A seminar report on speech recognition technology
A seminar report on speech recognition technologyA seminar report on speech recognition technology
A seminar report on speech recognition technology
 
Artificial Intelligence for Speech Recognition
Artificial Intelligence for Speech RecognitionArtificial Intelligence for Speech Recognition
Artificial Intelligence for Speech Recognition
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by Iqbal
 
Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
Sign Language Recognition based on Hands symbols Classification
Sign Language Recognition based on Hands symbols ClassificationSign Language Recognition based on Hands symbols Classification
Sign Language Recognition based on Hands symbols Classification
 
Sign language recognizer
Sign language recognizerSign language recognizer
Sign language recognizer
 
Natural language processing: feature extraction
Natural language processing: feature extractionNatural language processing: feature extraction
Natural language processing: feature extraction
 
Deep Learning For Speech Recognition
Deep Learning For Speech RecognitionDeep Learning For Speech Recognition
Deep Learning For Speech Recognition
 
Speech Recognition Using Python | Edureka
Speech Recognition Using Python | EdurekaSpeech Recognition Using Python | Edureka
Speech Recognition Using Python | Edureka
 
Sign language translator ieee power point
Sign language translator ieee power pointSign language translator ieee power point
Sign language translator ieee power point
 
Speech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceSpeech Recognition in Artificail Inteligence
Speech Recognition in Artificail Inteligence
 
Natural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - IntroductionNatural Language Processing (NLP) - Introduction
Natural Language Processing (NLP) - Introduction
 
Automatic speech recognition
Automatic speech recognitionAutomatic speech recognition
Automatic speech recognition
 
Sign Language Recognition System.pptx
Sign Language Recognition System.pptxSign Language Recognition System.pptx
Sign Language Recognition System.pptx
 
Speaker Recognition
Speaker RecognitionSpeaker Recognition
Speaker Recognition
 
Natural Language Processing (NLP)
Natural Language Processing (NLP)Natural Language Processing (NLP)
Natural Language Processing (NLP)
 

Similar to Intro to Auto Speech Recognition -- How ML Learns Speech-to-Text

LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
IRJET Journal
 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognition
Stephen Marquard
 
Assign
AssignAssign
10
1010
Seminar
SeminarSeminar
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READINGLIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
IRJET Journal
 
Speech-Recognition.pptx
Speech-Recognition.pptxSpeech-Recognition.pptx
Speech-Recognition.pptx
JyothiMedisetty2
 
Project_Phase1_-_Literature_Review-1[1].pptx
Project_Phase1_-_Literature_Review-1[1].pptxProject_Phase1_-_Literature_Review-1[1].pptx
Project_Phase1_-_Literature_Review-1[1].pptx
ASHWIN808488
 
Efficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And RecordingEfficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And Recording
IOSR Journals
 
Nikko Ström at AI Frontiers: Deep Learning in Alexa
Nikko Ström at AI Frontiers: Deep Learning in AlexaNikko Ström at AI Frontiers: Deep Learning in Alexa
Nikko Ström at AI Frontiers: Deep Learning in Alexa
AI Frontiers
 
ACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITIONACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITION
ijistjournal
 
Web AI.pptx
Web AI.pptxWeb AI.pptx
Recent advances in LVCSR : A benchmark comparison of performances
Recent advances in LVCSR : A benchmark comparison of performancesRecent advances in LVCSR : A benchmark comparison of performances
Recent advances in LVCSR : A benchmark comparison of performances
IJECEIAES
 
Advances in Automatic Speech Recognition: From Audio-Only To Audio-Visual Sp...
Advances in Automatic Speech Recognition: From Audio-Only  To Audio-Visual Sp...Advances in Automatic Speech Recognition: From Audio-Only  To Audio-Visual Sp...
Advances in Automatic Speech Recognition: From Audio-Only To Audio-Visual Sp...
IOSR Journals
 
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
 MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
ijitcs
 
Interspeech 2017 s_miyoshi
Interspeech 2017 s_miyoshiInterspeech 2017 s_miyoshi
Interspeech 2017 s_miyoshi
Hiroyuki Miyoshi
 
Build your own ASR engine
Build your own ASR engineBuild your own ASR engine
Build your own ASR engine
Korakot Chaovavanich
 
IV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_ProcessingIV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_Processing
diegogee
 
Nonparametric Bayesian Word Discovery for Symbol Emergence in Robotics
Nonparametric Bayesian Word Discovery for Symbol Emergence in RoboticsNonparametric Bayesian Word Discovery for Symbol Emergence in Robotics
Nonparametric Bayesian Word Discovery for Symbol Emergence in Robotics
Tadahiro Taniguchi
 
Speech Recognition: Transcription and transformation of human speech
Speech Recognition: Transcription and transformation of human speechSpeech Recognition: Transcription and transformation of human speech
Speech Recognition: Transcription and transformation of human speech
SubmissionResearchpa
 

Similar to Intro to Auto Speech Recognition -- How ML Learns Speech-to-Text (20)

LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognition
 
Assign
AssignAssign
Assign
 
10
1010
10
 
Seminar
SeminarSeminar
Seminar
 
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READINGLIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
 
Speech-Recognition.pptx
Speech-Recognition.pptxSpeech-Recognition.pptx
Speech-Recognition.pptx
 
Project_Phase1_-_Literature_Review-1[1].pptx
Project_Phase1_-_Literature_Review-1[1].pptxProject_Phase1_-_Literature_Review-1[1].pptx
Project_Phase1_-_Literature_Review-1[1].pptx
 
Efficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And RecordingEfficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And Recording
 
Nikko Ström at AI Frontiers: Deep Learning in Alexa
Nikko Ström at AI Frontiers: Deep Learning in AlexaNikko Ström at AI Frontiers: Deep Learning in Alexa
Nikko Ström at AI Frontiers: Deep Learning in Alexa
 
ACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITIONACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITION
 
Web AI.pptx
Web AI.pptxWeb AI.pptx
Web AI.pptx
 
Recent advances in LVCSR : A benchmark comparison of performances
Recent advances in LVCSR : A benchmark comparison of performancesRecent advances in LVCSR : A benchmark comparison of performances
Recent advances in LVCSR : A benchmark comparison of performances
 
Advances in Automatic Speech Recognition: From Audio-Only To Audio-Visual Sp...
Advances in Automatic Speech Recognition: From Audio-Only  To Audio-Visual Sp...Advances in Automatic Speech Recognition: From Audio-Only  To Audio-Visual Sp...
Advances in Automatic Speech Recognition: From Audio-Only To Audio-Visual Sp...
 
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
 MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
 
Interspeech 2017 s_miyoshi
Interspeech 2017 s_miyoshiInterspeech 2017 s_miyoshi
Interspeech 2017 s_miyoshi
 
Build your own ASR engine
Build your own ASR engineBuild your own ASR engine
Build your own ASR engine
 
IV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_ProcessingIV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_Processing
 
Nonparametric Bayesian Word Discovery for Symbol Emergence in Robotics
Nonparametric Bayesian Word Discovery for Symbol Emergence in RoboticsNonparametric Bayesian Word Discovery for Symbol Emergence in Robotics
Nonparametric Bayesian Word Discovery for Symbol Emergence in Robotics
 
Speech Recognition: Transcription and transformation of human speech
Speech Recognition: Transcription and transformation of human speechSpeech Recognition: Transcription and transformation of human speech
Speech Recognition: Transcription and transformation of human speech
 

Recently uploaded

The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
fkyes25
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 

Recently uploaded (20)

The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
Natural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptxNatural Language Processing (NLP), RAG and its applications .pptx
Natural Language Processing (NLP), RAG and its applications .pptx
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 

Intro to Auto Speech Recognition -- How ML Learns Speech-to-Text

  • 1.
  • 2. What is Auto Speech Recognition (ASR)? Hello, how are you? Voice Speech Waveform Feature from Audio (e.g., Spectrogram) Auto ML Text Automatic Speech Recognition (ASR) The technology of converting speech to written form (called speech-to- text) which human can interpret the meaning of text. 1/7
  • 3. How to Understand Speech: Approach by Educated Human • Reading Spectrogram source: Step by step through a spectrogram, https://www.youtube.com/watch?v=lfZ6XSRaRR8 2/7
  • 4. How to Understand Speech: Approach by AI • Static ASR system Sound Waveform ↓ Acoustic Feature Extraction ↓ Acoustic Model ↓ Pronunciation Model ↓ Language Model ↓ Text • End-to-End Neural ASR System Sound Waveform ↓ Acoustic Feature Extraction ↓ Acoustic Model with DNN ↓ Language Model with DNN ↓ Text (Graves, Jaitley, 2014, “Towards End-to-End Speech Recognition with Recurrent Neural Networks”, ICML) (Chan et. al., 2016, “Listen, Attend and Spell: A Neural Network for LargeVocabulary Conversational Speech Recognition”, ICASSP) 3/7
  • 5. Challenges in a System Using Only Acoustic Model Word “Probably” Dictionary Pronunciation pr aa b ax b l iy Actual pronunciations (many common ways to pronounce) p r aa b iy p r aw l uh p r aa l iy p aa b uh b liy p ow ih p aa iy p r ah b iy (Preethi Jyothi, 2017, “Automatic Speech Recognition - An Overview”) I • probably (p r aa b iy) • probability (p r aa b iy) play tennis. Language Model 4/7
  • 6. BeyondText – Insights Insight • Topic Classification • Semantic Parsing and Question Answering • Customer Segmentation/ Prioritization • Summary Generation Voice Sound Waveform Feature Auto ML Text Insights! 5/7
  • 7. Current Challenges in ASR • Noisy real-life conversion with multiple speakers • Robustness to variations in ages and accents • Integration of effort across multiple dialects with transfer learning • Embedded ASR system locally on mobile devices without internet connection • Bad channel conditions (intermittently dropping voice) (Preethi Jyothi, 2018, “State-of-the-Art in Speech Technologies”) 6/7
  • 8. Great Study Materials on Speech Recognition atYouTube Deep Learning Lecture Series 1. “CS231N Winter 2016” Convolutional Neural Networks for Visual Recognition by Stanford University (Andrej Karpathy ver.), 16 videos 2. “CS231N Spring 2017” Convolutional Neural Networks for Visual Recognition by Stanford University, 16 videos 3. “CS224N Winter 2017” Natural Language Processing with Deep Learning by Stanford University, 19 videos Auto Speech Recognition Sessions 1. Automatic Speech Recognition - An Overview Presenter is Preethi Jyothi, IIT Bombay in Sep 2017 2. State-of-the-Art in Speech Technologies Presenter is Preethi Jyothi, IIT Bombay in Jan 2018 3. Lecture 2 | Word Vector Representations: word2vec Lecture 2 in Natural Language Processing with Deep Learning 4. Step by step through a spectrogram Lecturer is Andy McMillin, Clinical Associate Professor at Portland State University 7/7

Editor's Notes

  1. The reason that the sample is discretized within a certain sample rate is that within around 25 mili-seconds, you speech signal is stationary. Starting from raw speech waveform, we generate tiny slices, which is called speech frames, each speech frame represents a feature.
  2. Amy Costanza Smith
  3. Academic Research Summit, which was co-organized by Microsoft Research, was held at the International Institute of Information Technology (IIIT) Hyderabad on the 24th and 25th of January 2018.