SlideShare a Scribd company logo
1 of 22
Methods and algorithms of 
speech recognition 
Nikolay V. Karpov (nkarpov(а)hse.ru) 
Duration 
1 module, 10 weeks, 40 academic hours 
Requirements 
3 practical works at home using Matlab or others 
(lms.hse.ru) 
Final assessment
That is course about? 
2 Modelling Speech Production Acoustics 
3 Time/Frequency Representation. Properties 
of Digital Filters 
4 Linear Predictive Modelling 
5-6 Speech Coding 
7 Phonetics 
8 Speech Synthesis 
9-10 Speech Recognition
References 
 Lingvocourse.ru 
http://lingvocourse.ru/wiki/index.php/Speech_recognition 
 Jurafsky Speech and Language Processing 
 Digital speech processing, synthesis, and recognition / Sadaoki Furui.- 2nd 
ed., 
 Speech Analysis Synthesis and Perception 
http://hear.ai.uiuc.edu/ECE537/PDF/main-all.pdf 
 FUNDAMETALS OF SPEECH RECOGNITION: A SHORT COURSE 
http://speech.tifr.res.in/tutorials/fundamentalOfASR_picone96.pdf 
 Speech Processing. 20 lectures in the Spring Term. Mike Brookes 
http://www.ee.ic.ac.uk/hp/staff/dmb/courses/speech/speech.htm
Speech Processing Tasks 
Coding 
Synthesis (TTS) 
Recognition (STT) 
Identity Verification (Individual information) 
Emotion of speaker analysis 
Enhancement
Speech Coding 
What: To transmit/store a speech waveform using as few bits as 
possible while retaining high quality 
Why: To save bandwidth in telecoms applications and to reduce 
memory storage requirements. 
How: 
Correlation ⇒Predictability ⇒Redundancy 
◦ Predict waveform samples from previous samples and transmit only the prediction error 
◦ Autocorrelation is Fourier transform of power spectrum: a peaky spectrum ⇒strong short-term 
correlations (~ 0.5 ms) 
◦ Voiced speech is almost periodic ⇒strong long-term correlations (~ 10 ms) 
Devote few bits to the aspects of speech where errors are least noticeable 
◦ High amplitude speech will mask noise at the same frequency 
Ignore aspects of the speech that are inaudible 
◦ Power spectrum is much more important than precise waveform 
◦ For aperiodic sounds, the fine detail of the spectrum does not matter
Speech Synthesis 
What: To convert a text string into a speech waveform 
Why: For technology to communicate when a display would be inconvenient 
because: 
(a) Too big, (b) Eyes busy, (c) Via phone, (d) In the dark, (e) Moving around 
Problems: 
The spelling of words doesn’t match their sound 
◦ Pronunciation rules + an exceptions dictionary 
Some words have multiple meanings + sounds 
◦ Must guess which is the correct sound 
Simplistic speech models sound mechanical 
◦ Can use extracts from real speech 
Speech sounds are influenced by adjacent phonemes 
◦ Use phoneme pairs from real speech 
Important words must be slightly louder 
◦ Must try to understand the text unit 
Voice pitch and talking speed must vary smoothly throughout a sentence 
◦ Must be able to change pitch and speed without affecting formant frequencies
Speech Recognition 
What: To convert a speech waveform into text 
Why: To communicate and control technology when a keyboard would be inconvenient 
because: 
(a) Too big, (b) Hands busy, (c) Via phone, (d) In the dark, (e) Moving around 
Problems: 
The spelling of words doesn’t match their sound 
◦ Have a big phonetic dictionary 
The waveform of a word varies a lot between different speakers (or even the same 
speaker) 
◦ Extract features from the speech waveform that are more consistent than the 
waveform 
The extracted features won’t be exactly repeatable 
◦ Characterize them with a probability distribution 
Speech sounds are influenced by adjacent phonemes 
◦ Use context-dependent probability distributions 
Speaking speed varies enormously 
◦ Try all possible speaking speeds 
No clear boundary between words or phonemes 
◦ Try all possible boundaries
Linguistic information 
Speech waves conveys: 
Speaker meaning 
Individual information 
Emotion of speaker 
Phrase(sentence) -> word units -> word -> syllables 
-> phonemes -> phone 
• A phone is the acoustic realization of a phoneme 
• Allophones are context dependent phonemes
Phonemes 
Russian 
а э и о у ы п п' б б' м м' ф ф' в в' т т' д д' н н' с с' з 
з' р р' л л' ш ж щ җ ц ч й к к' г г' х х‘ 
English 
http://en.wikipedia.org/wiki/English_phonology
Phonemes 
 Speakers and listeners divide words into component sounds called 
phonemes. 
◦ Native speakers agree on the phonemes that make up a particular word 
◦ There are about 42 phonemes in English 
 The phonemes in a particular word may vary with dialect 
◦ High amplitude speech will mask noise at the same frequency 
 The actual sound that corresponds to a particular phoneme depends on: 
◦ the adjacent phonemes in the word or sentence 
◦ the accent of the speaker 
◦ the talking speed 
◦ whether it is a formal or informal occasion
SPEECH PRODUCTION 
MECHANISM
Sources of Sound Energy 
 Turbulence: air moving quickly through a small hole (e.g./s/ in 
“size”) 
 Explosion: pressure built up behind a blockage is suddenly released 
(e.g. /p/ in “pop”) 
 Vocal Cords(Fold) Vibration 
• airflow through vocal folds (vocal cords) reduces the pressure and 
they snap shut (Bernoulli effect) 
• muscle tension and air pressure buildup force the folds open again 
and the process repeats 
• frequency of vibration (fx) determined by tension in vocal folds and 
pressure from lungs 
• for normal breathing and voiceless sounds (e.g. /s/) the vocal folds 
are held wide open and don’t vibrate
Phonemes Classification 
Vowel /а/, /о/, /у/ 
Consonant 
◦ Unvoiced 
 Fricative /ш/, /щ/, /ф/, /х/ 
 Plosive /п/, /к/, /т/ 
 Affricate /ч/, /ц/ 
◦ Voiced 
 Fricative /ж/, /җ/, /в/, /р/ 
 Plosive /б/, /г/, /д/ 
 Diphthongs /oj/ 
 Nasal /н/, /м/ 
 Semivowel /r/, /j/, /w/
Vocal Tract Filter 
 The sound spectrum is modified by the shape of the 
vocal tract. This is determined by movements of the jaw, 
tongue and lips. 
 The resonant frequencies of the vocal tract cause peaks 
in the spectrum called formants. 
 The first two formant frequencies are roughly determined 
by the distances from the tongue hump to the larynx and 
to the lips respectively.
Phoneme Spectrum
Qualities of English vowels 
After Ladefoged 1993 
+ lips roundness
Speech waveform characteristic 
• Loudness 
• Voiced/Unvoiced 
• Pitch 
• Fundamental frequency 
• Spectral envelope 
• Formants
Lexical stress and Schwa 
Pitch accent 
Lexical stress 
Full vowels 
Reduced vowels, most common is [ax] -schwa
Speech Waveforms
A Source–filter Model Of 
Speech Production
Speech Spectrogram
Than you for your attention

More Related Content

What's hot

Phonetics presentation part II
Phonetics presentation   part IIPhonetics presentation   part II
Phonetics presentation part IIShermila Azariah
 
Speech organ uzma
Speech organ uzmaSpeech organ uzma
Speech organ uzmauzma bashir
 
Speech Processes (Phonation and Articulation)
Speech Processes (Phonation and Articulation)Speech Processes (Phonation and Articulation)
Speech Processes (Phonation and Articulation)Christian Sebastian
 
Phonetics in complete dentures./ dentistry course in india
Phonetics in complete dentures./ dentistry course in indiaPhonetics in complete dentures./ dentistry course in india
Phonetics in complete dentures./ dentistry course in indiaIndian dental academy
 
Speech Mechanism
Speech MechanismSpeech Mechanism
Speech Mechanismflattsph
 
Speech organ and manner of articulation
Speech organ and manner of articulationSpeech organ and manner of articulation
Speech organ and manner of articulationYanti95
 
106 the organ of speech
106 the organ of speech106 the organ of speech
106 the organ of speechNafis Kamal
 
Phonetics/ orthodontic straight wire technique
Phonetics/ orthodontic straight wire techniquePhonetics/ orthodontic straight wire technique
Phonetics/ orthodontic straight wire techniqueIndian dental academy
 

What's hot (17)

Phonetics presentation part II
Phonetics presentation   part IIPhonetics presentation   part II
Phonetics presentation part II
 
1.phonetics in cd
1.phonetics in cd1.phonetics in cd
1.phonetics in cd
 
The resonating-parts (1)
The resonating-parts (1)The resonating-parts (1)
The resonating-parts (1)
 
Speech organ uzma
Speech organ uzmaSpeech organ uzma
Speech organ uzma
 
Pnp lectur 1
Pnp lectur 1Pnp lectur 1
Pnp lectur 1
 
The Sounds of Language
The Sounds of LanguageThe Sounds of Language
The Sounds of Language
 
Speech Processes (Phonation and Articulation)
Speech Processes (Phonation and Articulation)Speech Processes (Phonation and Articulation)
Speech Processes (Phonation and Articulation)
 
Phonetics in complete dentures./ dentistry course in india
Phonetics in complete dentures./ dentistry course in indiaPhonetics in complete dentures./ dentistry course in india
Phonetics in complete dentures./ dentistry course in india
 
Speech Mechanism
Speech MechanismSpeech Mechanism
Speech Mechanism
 
13 phonetics iii
13 phonetics iii13 phonetics iii
13 phonetics iii
 
Speech mechanism
Speech mechanismSpeech mechanism
Speech mechanism
 
the sounds of language
the sounds of languagethe sounds of language
the sounds of language
 
Speech organ and manner of articulation
Speech organ and manner of articulationSpeech organ and manner of articulation
Speech organ and manner of articulation
 
106 the organ of speech
106 the organ of speech106 the organ of speech
106 the organ of speech
 
Phonetics/ orthodontic straight wire technique
Phonetics/ orthodontic straight wire techniquePhonetics/ orthodontic straight wire technique
Phonetics/ orthodontic straight wire technique
 
Phonetics
PhoneticsPhonetics
Phonetics
 
Unit 2pps
Unit 2ppsUnit 2pps
Unit 2pps
 

Viewers also liked

Теория и практика обработки естественного языка
Теория и практика обработки естественного языкаТеория и практика обработки естественного языка
Теория и практика обработки естественного языкаNikolay Karpov
 
Text Analytics for Security
Text Analytics for SecurityText Analytics for Security
Text Analytics for SecurityTao Xie
 
Computer Assisted Language Learning and Teaching
Computer Assisted Language Learning and TeachingComputer Assisted Language Learning and Teaching
Computer Assisted Language Learning and TeachingMaqsood Ahmad
 
The features of the connected speech final
The features of the connected speech finalThe features of the connected speech final
The features of the connected speech finalHina Honey
 
Blending, phrasing and intonation
Blending, phrasing and intonationBlending, phrasing and intonation
Blending, phrasing and intonationRyan Lualhati
 
Physiology of speech
Physiology of speechPhysiology of speech
Physiology of speechAmit kumar
 
Manner Of Articulation
Manner Of ArticulationManner Of Articulation
Manner Of Articulationjdspider
 

Viewers also liked (13)

The organs of speech and their function
The organs of speech and their functionThe organs of speech and their function
The organs of speech and their function
 
Теория и практика обработки естественного языка
Теория и практика обработки естественного языкаТеория и практика обработки естественного языка
Теория и практика обработки естественного языка
 
Chapter21
Chapter21Chapter21
Chapter21
 
Text Analytics for Security
Text Analytics for SecurityText Analytics for Security
Text Analytics for Security
 
Language
LanguageLanguage
Language
 
Computer Assisted Language Learning and Teaching
Computer Assisted Language Learning and TeachingComputer Assisted Language Learning and Teaching
Computer Assisted Language Learning and Teaching
 
The features of the connected speech final
The features of the connected speech finalThe features of the connected speech final
The features of the connected speech final
 
Blending, phrasing and intonation
Blending, phrasing and intonationBlending, phrasing and intonation
Blending, phrasing and intonation
 
The Anatomy and Physiology of Speech Production(Phonetics)
The Anatomy and Physiology of Speech Production(Phonetics)The Anatomy and Physiology of Speech Production(Phonetics)
The Anatomy and Physiology of Speech Production(Phonetics)
 
Connected speech features
Connected speech featuresConnected speech features
Connected speech features
 
Physiology of speech
Physiology of speechPhysiology of speech
Physiology of speech
 
Stages of speaking
Stages of speakingStages of speaking
Stages of speaking
 
Manner Of Articulation
Manner Of ArticulationManner Of Articulation
Manner Of Articulation
 

Similar to Principal characteristics of speech

Phonetics & phonology, INTRODUCTION, Dr, Salama Embarak
Phonetics & phonology, INTRODUCTION, Dr, Salama EmbarakPhonetics & phonology, INTRODUCTION, Dr, Salama Embarak
Phonetics & phonology, INTRODUCTION, Dr, Salama EmbarakAbdulsalam Mohammed
 
speech processing basics
speech processing basicsspeech processing basics
speech processing basicssivakumar m
 
Phonetics Phonology
Phonetics  PhonologyPhonetics  Phonology
Phonetics PhonologyBank Miko
 
Presentation for China Forum (1).ppt
Presentation for China Forum (1).pptPresentation for China Forum (1).ppt
Presentation for China Forum (1).pptRAJALAKSHMIJ10
 
Teaching alphabetics and fluency in reading
Teaching alphabetics and fluency in readingTeaching alphabetics and fluency in reading
Teaching alphabetics and fluency in readingMarcia Luptak
 
Speech and Language Processing
Speech and Language ProcessingSpeech and Language Processing
Speech and Language ProcessingVikalp Mahendra
 
Phonetics full
Phonetics fullPhonetics full
Phonetics fullHina Honey
 
Class 09 emerson_phonetics_fall2014_phonemes_allophones_vot_epg
Class 09 emerson_phonetics_fall2014_phonemes_allophones_vot_epgClass 09 emerson_phonetics_fall2014_phonemes_allophones_vot_epg
Class 09 emerson_phonetics_fall2014_phonemes_allophones_vot_epgLisa Lavoie
 
(Emerson) Phonetics & Phonology.pptx
(Emerson) Phonetics & Phonology.pptx(Emerson) Phonetics & Phonology.pptx
(Emerson) Phonetics & Phonology.pptxShamsUlFatah
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizyLizy Abraham
 
Ch 9 Language and Speech Processing.pptx
Ch 9 Language and Speech Processing.pptxCh 9 Language and Speech Processing.pptx
Ch 9 Language and Speech Processing.pptxLarry195181
 

Similar to Principal characteristics of speech (20)

Phonetics & phonology, INTRODUCTION, Dr, Salama Embarak
Phonetics & phonology, INTRODUCTION, Dr, Salama EmbarakPhonetics & phonology, INTRODUCTION, Dr, Salama Embarak
Phonetics & phonology, INTRODUCTION, Dr, Salama Embarak
 
Part1 speech basics
Part1 speech basicsPart1 speech basics
Part1 speech basics
 
speech processing basics
speech processing basicsspeech processing basics
speech processing basics
 
Phonetics Phonology
Phonetics  PhonologyPhonetics  Phonology
Phonetics Phonology
 
Presentation for China Forum (1).ppt
Presentation for China Forum (1).pptPresentation for China Forum (1).ppt
Presentation for China Forum (1).ppt
 
Say That Again? Enhancing Your Accent Acumen
Say That Again? Enhancing Your Accent AcumenSay That Again? Enhancing Your Accent Acumen
Say That Again? Enhancing Your Accent Acumen
 
B110512
B110512B110512
B110512
 
Week 3 phonology copy
Week 3  phonology   copyWeek 3  phonology   copy
Week 3 phonology copy
 
Isolated English Word Recognition System: Appropriate for Bengali-accented En...
Isolated English Word Recognition System: Appropriate for Bengali-accented En...Isolated English Word Recognition System: Appropriate for Bengali-accented En...
Isolated English Word Recognition System: Appropriate for Bengali-accented En...
 
Teaching alphabetics and fluency in reading
Teaching alphabetics and fluency in readingTeaching alphabetics and fluency in reading
Teaching alphabetics and fluency in reading
 
Speech and Language Processing
Speech and Language ProcessingSpeech and Language Processing
Speech and Language Processing
 
Phonetics full
Phonetics fullPhonetics full
Phonetics full
 
Class 09 emerson_phonetics_fall2014_phonemes_allophones_vot_epg
Class 09 emerson_phonetics_fall2014_phonemes_allophones_vot_epgClass 09 emerson_phonetics_fall2014_phonemes_allophones_vot_epg
Class 09 emerson_phonetics_fall2014_phonemes_allophones_vot_epg
 
ppt on phonology
 ppt on phonology ppt on phonology
ppt on phonology
 
Ppp9
Ppp9Ppp9
Ppp9
 
(Emerson) Phonetics & Phonology.pptx
(Emerson) Phonetics & Phonology.pptx(Emerson) Phonetics & Phonology.pptx
(Emerson) Phonetics & Phonology.pptx
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizy
 
Ch 9 Language and Speech Processing.pptx
Ch 9 Language and Speech Processing.pptxCh 9 Language and Speech Processing.pptx
Ch 9 Language and Speech Processing.pptx
 
LNG506 WEEK 2.pptx
LNG506 WEEK 2.pptxLNG506 WEEK 2.pptx
LNG506 WEEK 2.pptx
 
Phonetics phonology
Phonetics phonologyPhonetics phonology
Phonetics phonology
 

More from Nikolay Karpov

Идентификация уровня сложности текста и его адаптация
Идентификация уровня сложности текста и его адаптацияИдентификация уровня сложности текста и его адаптация
Идентификация уровня сложности текста и его адаптацияNikolay Karpov
 
Идентификация уровня ложности текста и его адаптация
Идентификация уровня ложности текста и его адаптацияИдентификация уровня ложности текста и его адаптация
Идентификация уровня ложности текста и его адаптацияNikolay Karpov
 
Speech waves in tube and filters
Speech waves in tube and filtersSpeech waves in tube and filters
Speech waves in tube and filtersNikolay Karpov
 
Speech signal time frequency representation
Speech signal time frequency representationSpeech signal time frequency representation
Speech signal time frequency representationNikolay Karpov
 

More from Nikolay Karpov (7)

Идентификация уровня сложности текста и его адаптация
Идентификация уровня сложности текста и его адаптацияИдентификация уровня сложности текста и его адаптация
Идентификация уровня сложности текста и его адаптация
 
Идентификация уровня ложности текста и его адаптация
Идентификация уровня ложности текста и его адаптацияИдентификация уровня ложности текста и его адаптация
Идентификация уровня ложности текста и его адаптация
 
Cepstral coefficients
Cepstral coefficientsCepstral coefficients
Cepstral coefficients
 
Linear prediction
Linear predictionLinear prediction
Linear prediction
 
Speech waves in tube and filters
Speech waves in tube and filtersSpeech waves in tube and filters
Speech waves in tube and filters
 
Speech signal time frequency representation
Speech signal time frequency representationSpeech signal time frequency representation
Speech signal time frequency representation
 
Tagger numbers
Tagger numbersTagger numbers
Tagger numbers
 

Recently uploaded

Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfakmcokerachita
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxAnaBeatriceAblay2
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 

Recently uploaded (20)

Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Class 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdfClass 11 Legal Studies Ch-1 Concept of State .pdf
Class 11 Legal Studies Ch-1 Concept of State .pdf
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 

Principal characteristics of speech

  • 1. Methods and algorithms of speech recognition Nikolay V. Karpov (nkarpov(а)hse.ru) Duration 1 module, 10 weeks, 40 academic hours Requirements 3 practical works at home using Matlab or others (lms.hse.ru) Final assessment
  • 2. That is course about? 2 Modelling Speech Production Acoustics 3 Time/Frequency Representation. Properties of Digital Filters 4 Linear Predictive Modelling 5-6 Speech Coding 7 Phonetics 8 Speech Synthesis 9-10 Speech Recognition
  • 3. References  Lingvocourse.ru http://lingvocourse.ru/wiki/index.php/Speech_recognition  Jurafsky Speech and Language Processing  Digital speech processing, synthesis, and recognition / Sadaoki Furui.- 2nd ed.,  Speech Analysis Synthesis and Perception http://hear.ai.uiuc.edu/ECE537/PDF/main-all.pdf  FUNDAMETALS OF SPEECH RECOGNITION: A SHORT COURSE http://speech.tifr.res.in/tutorials/fundamentalOfASR_picone96.pdf  Speech Processing. 20 lectures in the Spring Term. Mike Brookes http://www.ee.ic.ac.uk/hp/staff/dmb/courses/speech/speech.htm
  • 4. Speech Processing Tasks Coding Synthesis (TTS) Recognition (STT) Identity Verification (Individual information) Emotion of speaker analysis Enhancement
  • 5. Speech Coding What: To transmit/store a speech waveform using as few bits as possible while retaining high quality Why: To save bandwidth in telecoms applications and to reduce memory storage requirements. How: Correlation ⇒Predictability ⇒Redundancy ◦ Predict waveform samples from previous samples and transmit only the prediction error ◦ Autocorrelation is Fourier transform of power spectrum: a peaky spectrum ⇒strong short-term correlations (~ 0.5 ms) ◦ Voiced speech is almost periodic ⇒strong long-term correlations (~ 10 ms) Devote few bits to the aspects of speech where errors are least noticeable ◦ High amplitude speech will mask noise at the same frequency Ignore aspects of the speech that are inaudible ◦ Power spectrum is much more important than precise waveform ◦ For aperiodic sounds, the fine detail of the spectrum does not matter
  • 6. Speech Synthesis What: To convert a text string into a speech waveform Why: For technology to communicate when a display would be inconvenient because: (a) Too big, (b) Eyes busy, (c) Via phone, (d) In the dark, (e) Moving around Problems: The spelling of words doesn’t match their sound ◦ Pronunciation rules + an exceptions dictionary Some words have multiple meanings + sounds ◦ Must guess which is the correct sound Simplistic speech models sound mechanical ◦ Can use extracts from real speech Speech sounds are influenced by adjacent phonemes ◦ Use phoneme pairs from real speech Important words must be slightly louder ◦ Must try to understand the text unit Voice pitch and talking speed must vary smoothly throughout a sentence ◦ Must be able to change pitch and speed without affecting formant frequencies
  • 7. Speech Recognition What: To convert a speech waveform into text Why: To communicate and control technology when a keyboard would be inconvenient because: (a) Too big, (b) Hands busy, (c) Via phone, (d) In the dark, (e) Moving around Problems: The spelling of words doesn’t match their sound ◦ Have a big phonetic dictionary The waveform of a word varies a lot between different speakers (or even the same speaker) ◦ Extract features from the speech waveform that are more consistent than the waveform The extracted features won’t be exactly repeatable ◦ Characterize them with a probability distribution Speech sounds are influenced by adjacent phonemes ◦ Use context-dependent probability distributions Speaking speed varies enormously ◦ Try all possible speaking speeds No clear boundary between words or phonemes ◦ Try all possible boundaries
  • 8. Linguistic information Speech waves conveys: Speaker meaning Individual information Emotion of speaker Phrase(sentence) -> word units -> word -> syllables -> phonemes -> phone • A phone is the acoustic realization of a phoneme • Allophones are context dependent phonemes
  • 9. Phonemes Russian а э и о у ы п п' б б' м м' ф ф' в в' т т' д д' н н' с с' з з' р р' л л' ш ж щ җ ц ч й к к' г г' х х‘ English http://en.wikipedia.org/wiki/English_phonology
  • 10. Phonemes  Speakers and listeners divide words into component sounds called phonemes. ◦ Native speakers agree on the phonemes that make up a particular word ◦ There are about 42 phonemes in English  The phonemes in a particular word may vary with dialect ◦ High amplitude speech will mask noise at the same frequency  The actual sound that corresponds to a particular phoneme depends on: ◦ the adjacent phonemes in the word or sentence ◦ the accent of the speaker ◦ the talking speed ◦ whether it is a formal or informal occasion
  • 12. Sources of Sound Energy  Turbulence: air moving quickly through a small hole (e.g./s/ in “size”)  Explosion: pressure built up behind a blockage is suddenly released (e.g. /p/ in “pop”)  Vocal Cords(Fold) Vibration • airflow through vocal folds (vocal cords) reduces the pressure and they snap shut (Bernoulli effect) • muscle tension and air pressure buildup force the folds open again and the process repeats • frequency of vibration (fx) determined by tension in vocal folds and pressure from lungs • for normal breathing and voiceless sounds (e.g. /s/) the vocal folds are held wide open and don’t vibrate
  • 13. Phonemes Classification Vowel /а/, /о/, /у/ Consonant ◦ Unvoiced  Fricative /ш/, /щ/, /ф/, /х/  Plosive /п/, /к/, /т/  Affricate /ч/, /ц/ ◦ Voiced  Fricative /ж/, /җ/, /в/, /р/  Plosive /б/, /г/, /д/  Diphthongs /oj/  Nasal /н/, /м/  Semivowel /r/, /j/, /w/
  • 14. Vocal Tract Filter  The sound spectrum is modified by the shape of the vocal tract. This is determined by movements of the jaw, tongue and lips.  The resonant frequencies of the vocal tract cause peaks in the spectrum called formants.  The first two formant frequencies are roughly determined by the distances from the tongue hump to the larynx and to the lips respectively.
  • 16. Qualities of English vowels After Ladefoged 1993 + lips roundness
  • 17. Speech waveform characteristic • Loudness • Voiced/Unvoiced • Pitch • Fundamental frequency • Spectral envelope • Formants
  • 18. Lexical stress and Schwa Pitch accent Lexical stress Full vowels Reduced vowels, most common is [ax] -schwa
  • 20. A Source–filter Model Of Speech Production
  • 22. Than you for your attention