SlideShare a Scribd company logo
1 of 8
What is Auto Speech Recognition (ASR)?
Hello,
how are you?
Voice Speech Waveform Feature from Audio
(e.g., Spectrogram)
Auto ML Text
Automatic Speech Recognition (ASR)
The technology of converting speech to written form (called speech-to-
text) which human can interpret the meaning of text.
1/7
How to Understand Speech: Approach by Educated Human
• Reading Spectrogram
source: Step by step through a spectrogram, https://www.youtube.com/watch?v=lfZ6XSRaRR8 2/7
How to Understand Speech: Approach by AI
• Static ASR system
Sound Waveform
↓
Acoustic Feature Extraction
↓
Acoustic Model
↓
Pronunciation Model
↓
Language Model
↓
Text
• End-to-End Neural ASR System
Sound Waveform
↓
Acoustic Feature Extraction
↓
Acoustic Model with DNN
↓
Language Model with DNN
↓
Text
(Graves, Jaitley, 2014, “Towards End-to-End Speech
Recognition with Recurrent Neural Networks”, ICML)
(Chan et. al., 2016, “Listen, Attend and Spell: A Neural Network
for LargeVocabulary Conversational Speech Recognition”, ICASSP) 3/7
Challenges in a System Using Only Acoustic Model
Word “Probably”
Dictionary Pronunciation pr aa b ax b l iy
Actual pronunciations
(many common ways
to pronounce)
p r aa b iy
p r aw l uh
p r aa l iy
p aa b uh b liy
p ow ih
p aa iy
p r ah b iy
(Preethi Jyothi, 2017, “Automatic Speech Recognition - An Overview”)
I
• probably (p r aa b iy)
• probability (p r aa b iy)
play tennis.
Language Model
4/7
BeyondText – Insights
Insight
• Topic Classification
• Semantic Parsing and Question Answering
• Customer Segmentation/ Prioritization
• Summary Generation
Voice Sound Waveform Feature Auto ML Text Insights!
5/7
Current Challenges in ASR
• Noisy real-life conversion with multiple speakers
• Robustness to variations in ages and accents
• Integration of effort across multiple dialects with transfer learning
• Embedded ASR system locally on mobile devices without internet
connection
• Bad channel conditions (intermittently dropping voice)
(Preethi Jyothi, 2018, “State-of-the-Art in Speech Technologies”)
6/7
Great Study Materials on Speech Recognition atYouTube
Deep Learning Lecture Series
1. “CS231N Winter 2016”
Convolutional Neural Networks for Visual Recognition by Stanford University (Andrej Karpathy ver.), 16 videos
2. “CS231N Spring 2017”
Convolutional Neural Networks for Visual Recognition by Stanford University, 16 videos
3. “CS224N Winter 2017”
Natural Language Processing with Deep Learning by Stanford University, 19 videos
Auto Speech Recognition Sessions
1. Automatic Speech Recognition - An Overview
Presenter is Preethi Jyothi, IIT Bombay in Sep 2017
2. State-of-the-Art in Speech Technologies
Presenter is Preethi Jyothi, IIT Bombay in Jan 2018
3. Lecture 2 | Word Vector Representations: word2vec
Lecture 2 in Natural Language Processing with Deep Learning
4. Step by step through a spectrogram
Lecturer is Andy McMillin, Clinical Associate Professor at Portland State University
7/7

More Related Content

What's hot

Natural language processing
Natural language processingNatural language processing
Natural language processingprashantdahake
 
Natural Language Processing seminar review
Natural Language Processing seminar review Natural Language Processing seminar review
Natural Language Processing seminar review Jayneel Vora
 
Natural language processing PPT presentation
Natural language processing PPT presentationNatural language processing PPT presentation
Natural language processing PPT presentationSai Mohith
 
History of deep learning
History of deep learningHistory of deep learning
History of deep learningayatan2
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2Yuriy Guts
 
Natural Language Processing for Games Research
Natural Language Processing for Games ResearchNatural Language Processing for Games Research
Natural Language Processing for Games ResearchJose Zagal
 

What's hot (7)

Natural language processing
Natural language processingNatural language processing
Natural language processing
 
Natural Language Processing seminar review
Natural Language Processing seminar review Natural Language Processing seminar review
Natural Language Processing seminar review
 
Natural language processing PPT presentation
Natural language processing PPT presentationNatural language processing PPT presentation
Natural language processing PPT presentation
 
History of deep learning
History of deep learningHistory of deep learning
History of deep learning
 
NLP
NLPNLP
NLP
 
UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2UCU NLP Summer Workshops 2017 - Part 2
UCU NLP Summer Workshops 2017 - Part 2
 
Natural Language Processing for Games Research
Natural Language Processing for Games ResearchNatural Language Processing for Games Research
Natural Language Processing for Games Research
 

Similar to What is Auto Speech Recognition (ASR

LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...IRJET Journal
 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionStephen Marquard
 
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READINGLIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READINGIRJET Journal
 
Project_Phase1_-_Literature_Review-1[1].pptx
Project_Phase1_-_Literature_Review-1[1].pptxProject_Phase1_-_Literature_Review-1[1].pptx
Project_Phase1_-_Literature_Review-1[1].pptxASHWIN808488
 
Efficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And RecordingEfficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And RecordingIOSR Journals
 
Nikko Ström at AI Frontiers: Deep Learning in Alexa
Nikko Ström at AI Frontiers: Deep Learning in AlexaNikko Ström at AI Frontiers: Deep Learning in Alexa
Nikko Ström at AI Frontiers: Deep Learning in AlexaAI Frontiers
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speechBilgin Aksoy
 
ACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITIONACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITIONijistjournal
 
Recent advances in LVCSR : A benchmark comparison of performances
Recent advances in LVCSR : A benchmark comparison of performancesRecent advances in LVCSR : A benchmark comparison of performances
Recent advances in LVCSR : A benchmark comparison of performancesIJECEIAES
 
Advances in Automatic Speech Recognition: From Audio-Only To Audio-Visual Sp...
Advances in Automatic Speech Recognition: From Audio-Only  To Audio-Visual Sp...Advances in Automatic Speech Recognition: From Audio-Only  To Audio-Visual Sp...
Advances in Automatic Speech Recognition: From Audio-Only To Audio-Visual Sp...IOSR Journals
 
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
 MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORKijitcs
 
Interspeech 2017 s_miyoshi
Interspeech 2017 s_miyoshiInterspeech 2017 s_miyoshi
Interspeech 2017 s_miyoshiHiroyuki Miyoshi
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologySrijanKumar18
 
IV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_ProcessingIV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_Processingdiegogee
 

Similar to What is Auto Speech Recognition (ASR (20)

LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...
 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognition
 
Assign
AssignAssign
Assign
 
10
1010
10
 
Seminar
SeminarSeminar
Seminar
 
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READINGLIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
LIP READING: VISUAL SPEECH RECOGNITION USING LIP READING
 
Speech-Recognition.pptx
Speech-Recognition.pptxSpeech-Recognition.pptx
Speech-Recognition.pptx
 
Project_Phase1_-_Literature_Review-1[1].pptx
Project_Phase1_-_Literature_Review-1[1].pptxProject_Phase1_-_Literature_Review-1[1].pptx
Project_Phase1_-_Literature_Review-1[1].pptx
 
Efficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And RecordingEfficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And Recording
 
Nikko Ström at AI Frontiers: Deep Learning in Alexa
Nikko Ström at AI Frontiers: Deep Learning in AlexaNikko Ström at AI Frontiers: Deep Learning in Alexa
Nikko Ström at AI Frontiers: Deep Learning in Alexa
 
Introduction to text to speech
Introduction to text to speechIntroduction to text to speech
Introduction to text to speech
 
ACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITIONACHIEVING SECURITY VIA SPEECH RECOGNITION
ACHIEVING SECURITY VIA SPEECH RECOGNITION
 
Web AI.pptx
Web AI.pptxWeb AI.pptx
Web AI.pptx
 
Recent advances in LVCSR : A benchmark comparison of performances
Recent advances in LVCSR : A benchmark comparison of performancesRecent advances in LVCSR : A benchmark comparison of performances
Recent advances in LVCSR : A benchmark comparison of performances
 
Advances in Automatic Speech Recognition: From Audio-Only To Audio-Visual Sp...
Advances in Automatic Speech Recognition: From Audio-Only  To Audio-Visual Sp...Advances in Automatic Speech Recognition: From Audio-Only  To Audio-Visual Sp...
Advances in Automatic Speech Recognition: From Audio-Only To Audio-Visual Sp...
 
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
 MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK
 
Interspeech 2017 s_miyoshi
Interspeech 2017 s_miyoshiInterspeech 2017 s_miyoshi
Interspeech 2017 s_miyoshi
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Build your own ASR engine
Build your own ASR engineBuild your own ASR engine
Build your own ASR engine
 
IV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_ProcessingIV_WORKSHOP_NVIDIA-Audio_Processing
IV_WORKSHOP_NVIDIA-Audio_Processing
 

Recently uploaded

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 

Recently uploaded (20)

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 

What is Auto Speech Recognition (ASR

  • 1.
  • 2. What is Auto Speech Recognition (ASR)? Hello, how are you? Voice Speech Waveform Feature from Audio (e.g., Spectrogram) Auto ML Text Automatic Speech Recognition (ASR) The technology of converting speech to written form (called speech-to- text) which human can interpret the meaning of text. 1/7
  • 3. How to Understand Speech: Approach by Educated Human • Reading Spectrogram source: Step by step through a spectrogram, https://www.youtube.com/watch?v=lfZ6XSRaRR8 2/7
  • 4. How to Understand Speech: Approach by AI • Static ASR system Sound Waveform ↓ Acoustic Feature Extraction ↓ Acoustic Model ↓ Pronunciation Model ↓ Language Model ↓ Text • End-to-End Neural ASR System Sound Waveform ↓ Acoustic Feature Extraction ↓ Acoustic Model with DNN ↓ Language Model with DNN ↓ Text (Graves, Jaitley, 2014, “Towards End-to-End Speech Recognition with Recurrent Neural Networks”, ICML) (Chan et. al., 2016, “Listen, Attend and Spell: A Neural Network for LargeVocabulary Conversational Speech Recognition”, ICASSP) 3/7
  • 5. Challenges in a System Using Only Acoustic Model Word “Probably” Dictionary Pronunciation pr aa b ax b l iy Actual pronunciations (many common ways to pronounce) p r aa b iy p r aw l uh p r aa l iy p aa b uh b liy p ow ih p aa iy p r ah b iy (Preethi Jyothi, 2017, “Automatic Speech Recognition - An Overview”) I • probably (p r aa b iy) • probability (p r aa b iy) play tennis. Language Model 4/7
  • 6. BeyondText – Insights Insight • Topic Classification • Semantic Parsing and Question Answering • Customer Segmentation/ Prioritization • Summary Generation Voice Sound Waveform Feature Auto ML Text Insights! 5/7
  • 7. Current Challenges in ASR • Noisy real-life conversion with multiple speakers • Robustness to variations in ages and accents • Integration of effort across multiple dialects with transfer learning • Embedded ASR system locally on mobile devices without internet connection • Bad channel conditions (intermittently dropping voice) (Preethi Jyothi, 2018, “State-of-the-Art in Speech Technologies”) 6/7
  • 8. Great Study Materials on Speech Recognition atYouTube Deep Learning Lecture Series 1. “CS231N Winter 2016” Convolutional Neural Networks for Visual Recognition by Stanford University (Andrej Karpathy ver.), 16 videos 2. “CS231N Spring 2017” Convolutional Neural Networks for Visual Recognition by Stanford University, 16 videos 3. “CS224N Winter 2017” Natural Language Processing with Deep Learning by Stanford University, 19 videos Auto Speech Recognition Sessions 1. Automatic Speech Recognition - An Overview Presenter is Preethi Jyothi, IIT Bombay in Sep 2017 2. State-of-the-Art in Speech Technologies Presenter is Preethi Jyothi, IIT Bombay in Jan 2018 3. Lecture 2 | Word Vector Representations: word2vec Lecture 2 in Natural Language Processing with Deep Learning 4. Step by step through a spectrogram Lecturer is Andy McMillin, Clinical Associate Professor at Portland State University 7/7

Editor's Notes

  1. The reason that the sample is discretized within a certain sample rate is that within around 25 mili-seconds, you speech signal is stationary. Starting from raw speech waveform, we generate tiny slices, which is called speech frames, each speech frame represents a feature.
  2. Amy Costanza Smith
  3. Academic Research Summit, which was co-organized by Microsoft Research, was held at the International Institute of Information Technology (IIIT) Hyderabad on the 24th and 25th of January 2018.