What is Auto Speech Recognition (ASR

•Download as PPTX, PDF•

1 like•588 views

Automatic speech recognition (ASR) is the technology that converts speech to written text. There are two main approaches: static systems that use acoustic, pronunciation, and language models sequentially; and end-to-end neural networks that use deep neural networks for feature extraction, acoustic modeling, and language modeling. Challenges for ASR systems include noise, variations in accents and ages, transferring learning across dialects, and operating locally on devices without internet.

Data & Analytics

What is Auto Speech Recognition (ASR)?
Hello,
how are you?
Voice Speech Waveform Feature from Audio
(e.g., Spectrogram)
Auto ML Text
Automatic Speech Recognition (ASR)
The technology of converting speech to written form (called speech-to-
text) which human can interpret the meaning of text.
1/7

How to Understand Speech: Approach by Educated Human
• Reading Spectrogram
source: Step by step through a spectrogram, https://www.youtube.com/watch?v=lfZ6XSRaRR8 2/7

How to Understand Speech: Approach by AI
• Static ASR system
Sound Waveform
↓
Acoustic Feature Extraction
↓
Acoustic Model
↓
Pronunciation Model
↓
Language Model
↓
Text
• End-to-End Neural ASR System
Sound Waveform
↓
Acoustic Feature Extraction
↓
Acoustic Model with DNN
↓
Language Model with DNN
↓
Text
(Graves, Jaitley, 2014, “Towards End-to-End Speech
Recognition with Recurrent Neural Networks”, ICML)
(Chan et. al., 2016, “Listen, Attend and Spell: A Neural Network
for LargeVocabulary Conversational Speech Recognition”, ICASSP) 3/7

Challenges in a System Using Only Acoustic Model
Word “Probably”
Dictionary Pronunciation pr aa b ax b l iy
Actual pronunciations
(many common ways
to pronounce)
p r aa b iy
p r aw l uh
p r aa l iy
p aa b uh b liy
p ow ih
p aa iy
p r ah b iy
(Preethi Jyothi, 2017, “Automatic Speech Recognition - An Overview”)
I
• probably (p r aa b iy)
• probability (p r aa b iy)
play tennis.
Language Model
4/7

BeyondText – Insights
Insight
• Topic Classification
• Semantic Parsing and Question Answering
• Customer Segmentation/ Prioritization
• Summary Generation
Voice Sound Waveform Feature Auto ML Text Insights!
5/7

Current Challenges in ASR
• Noisy real-life conversion with multiple speakers
• Robustness to variations in ages and accents
• Integration of effort across multiple dialects with transfer learning
• Embedded ASR system locally on mobile devices without internet
connection
• Bad channel conditions (intermittently dropping voice)
(Preethi Jyothi, 2018, “State-of-the-Art in Speech Technologies”)
6/7

Great Study Materials on Speech Recognition atYouTube
Deep Learning Lecture Series
1. “CS231N Winter 2016”
Convolutional Neural Networks for Visual Recognition by Stanford University (Andrej Karpathy ver.), 16 videos
2. “CS231N Spring 2017”
Convolutional Neural Networks for Visual Recognition by Stanford University, 16 videos
3. “CS224N Winter 2017”
Natural Language Processing with Deep Learning by Stanford University, 19 videos
Auto Speech Recognition Sessions
1. Automatic Speech Recognition - An Overview
Presenter is Preethi Jyothi, IIT Bombay in Sep 2017
2. State-of-the-Art in Speech Technologies
Presenter is Preethi Jyothi, IIT Bombay in Jan 2018
3. Lecture 2 | Word Vector Representations: word2vec
Lecture 2 in Natural Language Processing with Deep Learning
4. Step by step through a spectrogram
Lecturer is Andy McMillin, Clinical Associate Professor at Portland State University
7/7

What's hot

Natural language processingprashantdahake

Natural Language Processing seminar review Jayneel Vora

Natural language processing PPT presentationSai Mohith

History of deep learningayatan2

NLPGirish Khanzode

UCU NLP Summer Workshops 2017 - Part 2Yuriy Guts

Natural Language Processing for Games ResearchJose Zagal

What's hot (7)

Natural language processing

Natural Language Processing seminar review

Natural language processing PPT presentation

History of deep learning

NLP

UCU NLP Summer Workshops 2017 - Part 2

Natural Language Processing for Games Research

Similar to What is Auto Speech Recognition (ASR

LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...IRJET Journal

Wreck a nice beach: adventures in speech recognitionStephen Marquard

Assignanshu agrawal

10Narender Singh

SeminarAkash Prajapati

LIP READING: VISUAL SPEECH RECOGNITION USING LIP READINGIRJET Journal

Speech-Recognition.pptxJyothiMedisetty2

Project_Phase1_-_Literature_Review-1[1].pptxASHWIN808488

Efficient Intralingual Text To Speech Web Podcasting And RecordingIOSR Journals

Nikko Ström at AI Frontiers: Deep Learning in AlexaAI Frontiers

Introduction to text to speechBilgin Aksoy

ACHIEVING SECURITY VIA SPEECH RECOGNITIONijistjournal

Web AI.pptx20CS102RAMMPRASHATHK

Recent advances in LVCSR : A benchmark comparison of performancesIJECEIAES

Advances in Automatic Speech Recognition: From Audio-Only To Audio-Visual Sp...IOSR Journals

MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORKijitcs

Interspeech 2017 s_miyoshiHiroyuki Miyoshi

Speech Recognition TechnologySrijanKumar18

Build your own ASR engineKorakot Chaovavanich

IV_WORKSHOP_NVIDIA-Audio_Processingdiegogee

Similar to What is Auto Speech Recognition (ASR (20)

LIP READING - AN EFFICIENT CROSS AUDIO-VIDEO RECOGNITION USING 3D CONVOLUTION...

Wreck a nice beach: adventures in speech recognition

Assign

Seminar

LIP READING: VISUAL SPEECH RECOGNITION USING LIP READING

Speech-Recognition.pptx

Project_Phase1_-_Literature_Review-1[1].pptx

Efficient Intralingual Text To Speech Web Podcasting And Recording

Nikko Ström at AI Frontiers: Deep Learning in Alexa

Introduction to text to speech

ACHIEVING SECURITY VIA SPEECH RECOGNITION

Web AI.pptx

Recent advances in LVCSR : A benchmark comparison of performances

Advances in Automatic Speech Recognition: From Audio-Only To Audio-Visual Sp...

MULTILINGUAL SPEECH IDENTIFICATION USING ARTIFICIAL NEURAL NETWORK

Interspeech 2017 s_miyoshi

Speech Recognition Technology

Build your own ASR engine

IV_WORKSHOP_NVIDIA-Audio_Processing

Recently uploaded

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach

RadioAdProWritingCinderellabyButleri.pdfgstagge

1:1定制(UQ毕业证）昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk

INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman

Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation

Call Girls in Saket 99530🔝 56974 Escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa

DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett

GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch

Decoding Loan Approval: Predictive Modeling in ActionBoston Institute of Analytics

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh

Industrialised data - the key to AI success.pdfLars Albertsson

From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck

PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava

Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha

Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...Call Girls In Delhi Whatsup 9873940964 Enjoy Unlimited Pleasure

Recently uploaded (20)

dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt

RadioAdProWritingCinderellabyButleri.pdf

1:1定制(UQ毕业证）昆士兰大学毕业证成绩单修改留信学历认证原版一模一样

INTERNSHIP ON PURBASHA COMPOSITE TEX LTD

Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...

Call Girls in Saket 99530🔝 56974 Escort Service

Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130

Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf

DBA Basics: Getting Started with Performance Tuning.pdf

GA4 Without Cookies [Measure Camp AMS]

Decoding Loan Approval: Predictive Modeling in Action

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝

Industrialised data - the key to AI success.pdf

From idea to production in a day – Leveraging Azure ML and Streamlit to build...

PKS-TGC-1084-630 - Stage 1 Proposal.pptx

Call Girls In Dwarka 9654467111 Escorts Service

Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...

Call Girls In Mahipalpur O9654467111 Escorts Service

VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...

What is Auto Speech Recognition (ASR

2. What is Auto Speech Recognition (ASR)? Hello, how are you? Voice Speech Waveform Feature from Audio (e.g., Spectrogram) Auto ML Text Automatic Speech Recognition (ASR) The technology of converting speech to written form (called speech-to- text) which human can interpret the meaning of text. 1/7

3. How to Understand Speech: Approach by Educated Human • Reading Spectrogram source: Step by step through a spectrogram, https://www.youtube.com/watch?v=lfZ6XSRaRR8 2/7

4. How to Understand Speech: Approach by AI • Static ASR system Sound Waveform ↓ Acoustic Feature Extraction ↓ Acoustic Model ↓ Pronunciation Model ↓ Language Model ↓ Text • End-to-End Neural ASR System Sound Waveform ↓ Acoustic Feature Extraction ↓ Acoustic Model with DNN ↓ Language Model with DNN ↓ Text (Graves, Jaitley, 2014, “Towards End-to-End Speech Recognition with Recurrent Neural Networks”, ICML) (Chan et. al., 2016, “Listen, Attend and Spell: A Neural Network for LargeVocabulary Conversational Speech Recognition”, ICASSP) 3/7

5. Challenges in a System Using Only Acoustic Model Word “Probably” Dictionary Pronunciation pr aa b ax b l iy Actual pronunciations (many common ways to pronounce) p r aa b iy p r aw l uh p r aa l iy p aa b uh b liy p ow ih p aa iy p r ah b iy (Preethi Jyothi, 2017, “Automatic Speech Recognition - An Overview”) I • probably (p r aa b iy) • probability (p r aa b iy) play tennis. Language Model 4/7

6. BeyondText – Insights Insight • Topic Classification • Semantic Parsing and Question Answering • Customer Segmentation/ Prioritization • Summary Generation Voice Sound Waveform Feature Auto ML Text Insights! 5/7

7. Current Challenges in ASR • Noisy real-life conversion with multiple speakers • Robustness to variations in ages and accents • Integration of effort across multiple dialects with transfer learning • Embedded ASR system locally on mobile devices without internet connection • Bad channel conditions (intermittently dropping voice) (Preethi Jyothi, 2018, “State-of-the-Art in Speech Technologies”) 6/7

8. Great Study Materials on Speech Recognition atYouTube Deep Learning Lecture Series 1. “CS231N Winter 2016” Convolutional Neural Networks for Visual Recognition by Stanford University (Andrej Karpathy ver.), 16 videos 2. “CS231N Spring 2017” Convolutional Neural Networks for Visual Recognition by Stanford University, 16 videos 3. “CS224N Winter 2017” Natural Language Processing with Deep Learning by Stanford University, 19 videos Auto Speech Recognition Sessions 1. Automatic Speech Recognition - An Overview Presenter is Preethi Jyothi, IIT Bombay in Sep 2017 2. State-of-the-Art in Speech Technologies Presenter is Preethi Jyothi, IIT Bombay in Jan 2018 3. Lecture 2 | Word Vector Representations: word2vec Lecture 2 in Natural Language Processing with Deep Learning 4. Step by step through a spectrogram Lecturer is Andy McMillin, Clinical Associate Professor at Portland State University 7/7

Editor's Notes

The reason that the sample is discretized within a certain sample rate is that within around 25 mili-seconds, you speech signal is stationary. Starting from raw speech waveform, we generate tiny slices, which is called speech frames, each speech frame represents a feature.
Amy Costanza Smith
Academic Research Summit, which was co-organized by Microsoft Research, was held at the International Institute of Information Technology (IIIT) Hyderabad on the 24th and 25th of January 2018.

What is Auto Speech Recognition (ASR

Recommended

Recommended

More Related Content

What's hot

What's hot (7)

Similar to What is Auto Speech Recognition (ASR

Similar to What is Auto Speech Recognition (ASR (20)

Recently uploaded

Recently uploaded (20)

What is Auto Speech Recognition (ASR

Editor's Notes