SlideShare a Scribd company logo
Speech Technology
     Overview
Presented by
Amr Medhat

               Computer Engineering Department
                               Cairo University
                                    22-10-2005
??Speech… Why


The easiest way of communication for
            human beings
??Speech… How
                                 Noise




             Channel


           Signal + … Protocol


Sender        Message                Receiver
Computer Analogy

             text (TTS)        speech


 Speech               Speech
Production            Synthesis

                       (ASR)      ( )

 Speech      speech    Speech     text
Perception            Recognition
Recognition Made Easy
             I bought a boat.
             ‫افرنقعوا أيها المتكأكئين‬
                gute Nacht
Feature                Decoder
Extraction            (Search)



    Grammar         Lexicon             Phone
                                        Models
Recognizer Characteristics
 Discrete words / continuous speech
 Read / spontaneous speech
 Speaker dependent / independent
 Small / large vocabulary
 Finite state / context sensitive language
  model
What to study
 Phonetics and Phonology (Linguistics)
 Speech Signal Processing (DSP)
 Pattern Recognition (AI)
     Hidden    Markov Models ( )
     Artificial Neural Networks
     Hybrid ANN - HMM
Phonetics
   Phonetics: study of the production, perception,
    and physical properties of speech sounds
   Phonology: describes the way sounds function
    within a given language and how they are
    combined and organized
   Phoneme: The smallest phonetic unit in a
    language that is capable of conveying a
    distinction in meaning
   E.g.
     boat-bought,   car-jar, ‫نشاط-شمس ,أرض-أحمد‬
Speech Signal Processing
   Sampling
     Rate:
         e.g. 16 kHz
   Sample size: e.g. 16 bits
 Format: PCM (.wav files)
 Time or Frequency domain features?
 Spectrogram: represents the time-varying
    spectrum of a signal. (x, y, intensity)
   Can’t represent features?:
     Filters   Banks, LPCs, MFCCs
Spectrogram




Waveform and Spectrogram of the word: "phonetician"
HMM
   What is a model?
   The coins example




   Parameter estimation: Baum-Welch
   Decoding: Viterbi P (O | λ)
Tools
   Audio Editing
     Cool Edit ( )
     Gold Wave
     Sound Forge
   ASR
       HTK ( )
       MATLAB
       Microsoft SAPI SDK
       Java Speech API
       ISIP ASR Toolkit
       Torch (Machine learning tool)
Technologies and applications
   Speech Recognition
     Dictation
     Call centers & IVR systems
     Command and control

   Speech Verification: Pronunciation teaching
   Speaker Recognition: Security
   Speech Synthesis
     Reading for the blind
     Telephone inquiries
?Can Image Processing Help
 Audio Visual Speech Recognition
 Spectrogram Reading
 Spectrogram Filtering
 vOICE: seeing with sound
Speech Technology Overview

More Related Content

What's hot

Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
Alok Tiwari
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
Charu Joshi
 
An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
Yogesh Vijay
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
SrijanKumar18
 
Speech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceSpeech Recognition in Artificail Inteligence
Speech Recognition in Artificail Inteligence
Ilhaan Marwat
 
AUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYAUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEY
IJCERT
 
Speech recognition system
Speech recognition systemSpeech recognition system
Speech recognition system
Ripal Ranpara
 
Voice recognition system
Voice recognition systemVoice recognition system
Voice recognition system
avinash raibole
 
MediaMosa Transcription technology
MediaMosa Transcription technologyMediaMosa Transcription technology
MediaMosa Transcription technology
MediaMosa
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentation
himanshubhatti
 
Automatic Speech Recognion
Automatic Speech RecognionAutomatic Speech Recognion
Automatic Speech Recognion
International Islamic University
 
Speech recognition challenges
Speech recognition challengesSpeech recognition challenges
Speech recognition challenges
Alexandru Chica
 
44 language resources for computer assisted translation
44 language resources for computer assisted translation44 language resources for computer assisted translation
44 language resources for computer assisted translation
AEGIS-ACCESSIBLE Projects
 
Respeakers and interlingual live subtitlers
Respeakers and interlingual live subtitlersRespeakers and interlingual live subtitlers
Respeakers and interlingual live subtitlers
University of Warsaw
 
Final thesis
Final thesisFinal thesis
Final thesis
Akash Sahoo
 
SIGNWRITING SYMPOSIUM PRESENTATION 23: tunisigner Avatar Interpreter SignWrit...
SIGNWRITING SYMPOSIUM PRESENTATION 23: tunisigner Avatar Interpreter SignWrit...SIGNWRITING SYMPOSIUM PRESENTATION 23: tunisigner Avatar Interpreter SignWrit...
SIGNWRITING SYMPOSIUM PRESENTATION 23: tunisigner Avatar Interpreter SignWrit...
SignWriting For Sign Languages
 
Speaker recognition in android
Speaker recognition in androidSpeaker recognition in android
Speaker recognition in android
Anshuli Mittal
 
Model of Communication:Basic
Model of Communication:BasicModel of Communication:Basic
Model of Communication:Basic
ByMar Diversity and Communication Training
 
English ll
English llEnglish ll
English ll
Jose
 

What's hot (20)

Automatic speech recognition system
Automatic speech recognition systemAutomatic speech recognition system
Automatic speech recognition system
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
An Introduction To Speech Recognition
An Introduction To Speech RecognitionAn Introduction To Speech Recognition
An Introduction To Speech Recognition
 
Automatic Speech Recognition
Automatic Speech RecognitionAutomatic Speech Recognition
Automatic Speech Recognition
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Speech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceSpeech Recognition in Artificail Inteligence
Speech Recognition in Artificail Inteligence
 
AUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYAUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEY
 
Speech recognition system
Speech recognition systemSpeech recognition system
Speech recognition system
 
Voice recognition system
Voice recognition systemVoice recognition system
Voice recognition system
 
MediaMosa Transcription technology
MediaMosa Transcription technologyMediaMosa Transcription technology
MediaMosa Transcription technology
 
Speech recognition final presentation
Speech recognition final presentationSpeech recognition final presentation
Speech recognition final presentation
 
Automatic Speech Recognion
Automatic Speech RecognionAutomatic Speech Recognion
Automatic Speech Recognion
 
Speech recognition challenges
Speech recognition challengesSpeech recognition challenges
Speech recognition challenges
 
44 language resources for computer assisted translation
44 language resources for computer assisted translation44 language resources for computer assisted translation
44 language resources for computer assisted translation
 
Respeakers and interlingual live subtitlers
Respeakers and interlingual live subtitlersRespeakers and interlingual live subtitlers
Respeakers and interlingual live subtitlers
 
Final thesis
Final thesisFinal thesis
Final thesis
 
SIGNWRITING SYMPOSIUM PRESENTATION 23: tunisigner Avatar Interpreter SignWrit...
SIGNWRITING SYMPOSIUM PRESENTATION 23: tunisigner Avatar Interpreter SignWrit...SIGNWRITING SYMPOSIUM PRESENTATION 23: tunisigner Avatar Interpreter SignWrit...
SIGNWRITING SYMPOSIUM PRESENTATION 23: tunisigner Avatar Interpreter SignWrit...
 
Speaker recognition in android
Speaker recognition in androidSpeaker recognition in android
Speaker recognition in android
 
Model of Communication:Basic
Model of Communication:BasicModel of Communication:Basic
Model of Communication:Basic
 
English ll
English llEnglish ll
English ll
 

Viewers also liked

ELEKS Summer School 2012: .NET 04 - Resources and Memory
ELEKS Summer School 2012: .NET 04 - Resources and MemoryELEKS Summer School 2012: .NET 04 - Resources and Memory
ELEKS Summer School 2012: .NET 04 - Resources and Memory
Yuriy Guts
 
Coms30123 Synthesis 3 Projector
Coms30123 Synthesis 3 ProjectorComs30123 Synthesis 3 Projector
Coms30123 Synthesis 3 Projector
Dr. Cupid Lucid
 
Text-To-Speech Technology: Enriching the VLE, Enhancing the Learning Experience
Text-To-Speech Technology: Enriching the VLE, Enhancing the Learning ExperienceText-To-Speech Technology: Enriching the VLE, Enhancing the Learning Experience
Text-To-Speech Technology: Enriching the VLE, Enhancing the Learning Experience
BlackboardEMEA
 
Speech technology basics
Speech technology   basicsSpeech technology   basics
Speech technology basics
Hemaraja Nayaka S
 
OPTIAL FIBRE COMMUNICATION
OPTIAL FIBRE COMMUNICATIONOPTIAL FIBRE COMMUNICATION
OPTIAL FIBRE COMMUNICATION
Ranjit Pudi
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1
Samiul Parag
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
Aamir-sheriff
 
Optical drive
Optical driveOptical drive
Optical drive
danimole
 
Speech processing
Speech processingSpeech processing
Unit – 2
Unit – 2Unit – 2
Unit – 2
techbed
 
8251 USART
8251 USART8251 USART
8251 USART
coolsdhanesh
 
Project Proposal Presentation: Bangla Text to Speech Synthesis
Project Proposal Presentation: Bangla Text to Speech SynthesisProject Proposal Presentation: Bangla Text to Speech Synthesis
Project Proposal Presentation: Bangla Text to Speech Synthesis
Mamun Ahmed
 
Cryptography
CryptographyCryptography
Cryptography
Tushar Swami
 
Encryption
EncryptionEncryption
Encryption
Nitin Parbhakar
 
Human-Computer Interaction
Human-Computer InteractionHuman-Computer Interaction
Human-Computer Interaction
Tarek Amr
 
HCI Presentation
HCI PresentationHCI Presentation
HCI Presentation
Abdul Rasheed Memon
 
Introduction to HCI
Introduction to HCI Introduction to HCI
Introduction to HCI Deskala
 
HCI - Chapter 1
HCI - Chapter 1HCI - Chapter 1
HCI - Chapter 1
Alan Dix
 
Lecture 1: Human-Computer Interaction Introduction (2014)
Lecture 1: Human-Computer Interaction Introduction (2014)Lecture 1: Human-Computer Interaction Introduction (2014)
Lecture 1: Human-Computer Interaction Introduction (2014)
Lora Aroyo
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technology
Kalluri Madhuri
 

Viewers also liked (20)

ELEKS Summer School 2012: .NET 04 - Resources and Memory
ELEKS Summer School 2012: .NET 04 - Resources and MemoryELEKS Summer School 2012: .NET 04 - Resources and Memory
ELEKS Summer School 2012: .NET 04 - Resources and Memory
 
Coms30123 Synthesis 3 Projector
Coms30123 Synthesis 3 ProjectorComs30123 Synthesis 3 Projector
Coms30123 Synthesis 3 Projector
 
Text-To-Speech Technology: Enriching the VLE, Enhancing the Learning Experience
Text-To-Speech Technology: Enriching the VLE, Enhancing the Learning ExperienceText-To-Speech Technology: Enriching the VLE, Enhancing the Learning Experience
Text-To-Speech Technology: Enriching the VLE, Enhancing the Learning Experience
 
Speech technology basics
Speech technology   basicsSpeech technology   basics
Speech technology basics
 
OPTIAL FIBRE COMMUNICATION
OPTIAL FIBRE COMMUNICATIONOPTIAL FIBRE COMMUNICATION
OPTIAL FIBRE COMMUNICATION
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Optical drive
Optical driveOptical drive
Optical drive
 
Speech processing
Speech processingSpeech processing
Speech processing
 
Unit – 2
Unit – 2Unit – 2
Unit – 2
 
8251 USART
8251 USART8251 USART
8251 USART
 
Project Proposal Presentation: Bangla Text to Speech Synthesis
Project Proposal Presentation: Bangla Text to Speech SynthesisProject Proposal Presentation: Bangla Text to Speech Synthesis
Project Proposal Presentation: Bangla Text to Speech Synthesis
 
Cryptography
CryptographyCryptography
Cryptography
 
Encryption
EncryptionEncryption
Encryption
 
Human-Computer Interaction
Human-Computer InteractionHuman-Computer Interaction
Human-Computer Interaction
 
HCI Presentation
HCI PresentationHCI Presentation
HCI Presentation
 
Introduction to HCI
Introduction to HCI Introduction to HCI
Introduction to HCI
 
HCI - Chapter 1
HCI - Chapter 1HCI - Chapter 1
HCI - Chapter 1
 
Lecture 1: Human-Computer Interaction Introduction (2014)
Lecture 1: Human-Computer Interaction Introduction (2014)Lecture 1: Human-Computer Interaction Introduction (2014)
Lecture 1: Human-Computer Interaction Introduction (2014)
 
Speech synthesis technology
Speech synthesis technologySpeech synthesis technology
Speech synthesis technology
 

Similar to Speech Technology Overview

Speech recognition techniques
Speech recognition techniquesSpeech recognition techniques
Speech recognition techniques
sonukumar142
 
Asr
AsrAsr
Asr
kkkseld
 
Asr
AsrAsr
Asr
kkkseld
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
Hardik Kanjariya
 
Assign
AssignAssign
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
ankit_saluja
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
ankit_saluja
 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognition
Stephen Marquard
 
Speech recognizers & generators
Speech recognizers & generatorsSpeech recognizers & generators
Speech recognizers & generators
Paul Kahoro
 
BTP paper
BTP paperBTP paper
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
IJET - International Journal of Engineering and Techniques
 
General Speereo Technology
General Speereo TechnologyGeneral Speereo Technology
General Speereo Technology
Daniel Ischenko
 
lec26_audio.pptx
lec26_audio.pptxlec26_audio.pptx
lec26_audio.pptx
Karimdabbabi
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition
Goa App
 
visH (fin).pptx
visH (fin).pptxvisH (fin).pptx
visH (fin).pptx
tefflontrolegdy
 
Khmer ASR
Khmer ASRKhmer ASR
Khmer ASR
Bill Chea
 
Iitdmj 1
Iitdmj 1Iitdmj 1
Iitdmj 1
Ram Yadav
 
Enhance customer experience with conversational interfaces
Enhance customer experience with conversational interfacesEnhance customer experience with conversational interfaces
Enhance customer experience with conversational interfaces
Amazon Web Services
 
Iasa Presentatie
Iasa PresentatieIasa Presentatie
Iasa Presentatie
Mies Langelaar
 
Artificial Intelligence - An Introduction
Artificial Intelligence - An Introduction Artificial Intelligence - An Introduction
Artificial Intelligence - An Introduction
acemindia
 

Similar to Speech Technology Overview (20)

Speech recognition techniques
Speech recognition techniquesSpeech recognition techniques
Speech recognition techniques
 
Asr
AsrAsr
Asr
 
Asr
AsrAsr
Asr
 
Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Assign
AssignAssign
Assign
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Speech to text conversion
Speech to text conversionSpeech to text conversion
Speech to text conversion
 
Wreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognitionWreck a nice beach: adventures in speech recognition
Wreck a nice beach: adventures in speech recognition
 
Speech recognizers & generators
Speech recognizers & generatorsSpeech recognizers & generators
Speech recognizers & generators
 
BTP paper
BTP paperBTP paper
BTP paper
 
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
 
General Speereo Technology
General Speereo TechnologyGeneral Speereo Technology
General Speereo Technology
 
lec26_audio.pptx
lec26_audio.pptxlec26_audio.pptx
lec26_audio.pptx
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition
 
visH (fin).pptx
visH (fin).pptxvisH (fin).pptx
visH (fin).pptx
 
Khmer ASR
Khmer ASRKhmer ASR
Khmer ASR
 
Iitdmj 1
Iitdmj 1Iitdmj 1
Iitdmj 1
 
Enhance customer experience with conversational interfaces
Enhance customer experience with conversational interfacesEnhance customer experience with conversational interfaces
Enhance customer experience with conversational interfaces
 
Iasa Presentatie
Iasa PresentatieIasa Presentatie
Iasa Presentatie
 
Artificial Intelligence - An Introduction
Artificial Intelligence - An Introduction Artificial Intelligence - An Introduction
Artificial Intelligence - An Introduction
 

Recently uploaded

PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
simonomuemu
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
Celine George
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
Celine George
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
Colégio Santa Teresinha
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
Nicholas Montgomery
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
NgcHiNguyn25
 
Assessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptxAssessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptx
Kavitha Krishnan
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Akanksha trivedi rama nursing college kanpur.
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
adhitya5119
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 

Recently uploaded (20)

PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
MARY JANE WILSON, A “BOA MÃE” .
MARY JANE WILSON, A “BOA MÃE”           .MARY JANE WILSON, A “BOA MÃE”           .
MARY JANE WILSON, A “BOA MÃE” .
 
writing about opinions about Australia the movie
writing about opinions about Australia the moviewriting about opinions about Australia the movie
writing about opinions about Australia the movie
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
 
Assessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptxAssessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptx
 
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama UniversityNatural birth techniques - Mrs.Akanksha Trivedi Rama University
Natural birth techniques - Mrs.Akanksha Trivedi Rama University
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 

Speech Technology Overview

  • 1. Speech Technology Overview Presented by Amr Medhat Computer Engineering Department Cairo University 22-10-2005
  • 2. ??Speech… Why The easiest way of communication for human beings
  • 3. ??Speech… How Noise Channel Signal + … Protocol Sender Message Receiver
  • 4. Computer Analogy text (TTS) speech Speech Speech Production Synthesis (ASR) ( ) Speech speech Speech text Perception Recognition
  • 5. Recognition Made Easy I bought a boat. ‫افرنقعوا أيها المتكأكئين‬ gute Nacht Feature Decoder Extraction (Search) Grammar Lexicon Phone Models
  • 6. Recognizer Characteristics  Discrete words / continuous speech  Read / spontaneous speech  Speaker dependent / independent  Small / large vocabulary  Finite state / context sensitive language model
  • 7. What to study  Phonetics and Phonology (Linguistics)  Speech Signal Processing (DSP)  Pattern Recognition (AI)  Hidden Markov Models ( )  Artificial Neural Networks  Hybrid ANN - HMM
  • 8. Phonetics  Phonetics: study of the production, perception, and physical properties of speech sounds  Phonology: describes the way sounds function within a given language and how they are combined and organized  Phoneme: The smallest phonetic unit in a language that is capable of conveying a distinction in meaning  E.g.  boat-bought, car-jar, ‫نشاط-شمس ,أرض-أحمد‬
  • 9. Speech Signal Processing  Sampling  Rate: e.g. 16 kHz  Sample size: e.g. 16 bits  Format: PCM (.wav files)  Time or Frequency domain features?  Spectrogram: represents the time-varying spectrum of a signal. (x, y, intensity)  Can’t represent features?:  Filters Banks, LPCs, MFCCs
  • 10. Spectrogram Waveform and Spectrogram of the word: "phonetician"
  • 11. HMM  What is a model?  The coins example  Parameter estimation: Baum-Welch  Decoding: Viterbi P (O | λ)
  • 12. Tools  Audio Editing  Cool Edit ( )  Gold Wave  Sound Forge  ASR  HTK ( )  MATLAB  Microsoft SAPI SDK  Java Speech API  ISIP ASR Toolkit  Torch (Machine learning tool)
  • 13. Technologies and applications  Speech Recognition  Dictation  Call centers & IVR systems  Command and control  Speech Verification: Pronunciation teaching  Speaker Recognition: Security  Speech Synthesis  Reading for the blind  Telephone inquiries
  • 14. ?Can Image Processing Help  Audio Visual Speech Recognition  Spectrogram Reading  Spectrogram Filtering  vOICE: seeing with sound

Editor's Notes

  1. What is the need for speech technology? Why do we need to develop computer technologies tackling human speech? It is the easiest way for communication between people. So, why not communicating with computers by means of it? It’ll be really great. Do you remember the definition of AI? Solving problems human can do better. A very little child can speak, hear and understand you, but he cannot read, write or even do simple calculations. That’s why we need speech technology.
  2. Like any communication system, the speech communication process comprises a message that needs to be carried from sender to receiver through a channel. المرسل يصيغ الرسالة اللى فى مخه إالى إشارات لجهاز النطق بوضع معين للأحبال الصوتية والحلق واللسان والشفايف والرئة .. فيتحول الهواء عبر كل تللك المؤثرات إلى تضاغطات و تخلخلات باهتزاز معين فتنقل عبر الهواء للطرف الآخر يقوم المستقبل بتجميع هذه الإشارات من الأذن الخارجية ثم تعبر الوسطى إلى الداخلية عبر المطرقة والسندان والركاب إلى الطبلة إلى القوقعة فتتحول لإلى إشارات عصبية يقوم المخ بالبحث عن مدلولها ومعناها حتى يصل لمعنى الرسالة ويفهمها طبعا يحمل الوسيط إشارات أخرى تنتشر عبر الهواء كصوت المروحة والسيارات والطلبة بالخارج و الزن ..إلخ all this is called noise, i.e. the channel doesn’t carry only the sender’s signal; it carries lots of signals combined together in a complex signal, the receiver do some processing to filter it out first. But, if all this happens, well you be able to understand the coming signal after all this processing and filtering??!! Imagine you I’m talking in Japanese and you understand only in German !! Or will the air conditioning understand the signal transmitted by a TV remote control ?!! So, the message is not just a signal, it’s a signal + a communication protocol agreed upon between sender and receiver. So, in speech the message == signal + language
  3. Our focus mainly is on ASR. Note: beside the microphone/speaker; the sound card in the computer with it’s A/D and D/A converter plays the role of ear and mouse (physical part of speech processing) Note: Microphone converts acoustic pressure ( التضاغطات والتخلخلات الصوتية ) to electrical analog signal, the speakers do the opposite operation.
  4. After the audience hear the three sentences from you (without displaying them); ask them what they understand from every utterance they heard. You won’t understand the third sentence assuming that you know English only (you don’t know Arabic or German), your ear will notice strange sound (ch خ ) that cannot perceive. In the second sentence (assuming you know Arabic) you ear can perceive every pronounced sound (you have what is called phone models in the sounds database in your brain) and by sense you can get the sentence structure ( فعل أمر ) (as you have the language grammar in your brain too) but you couldn’t understand the sentence because you don’t have synonyms for the words you heard in your dictionary (words lexicon in your brain) For the first sentence (assuming you know English) uttered sounds are ok as well as the words too; but two words have almost the same pronunciation. You hardly could get with the aid of the language grammar the told you the first word is a verb while the second is a noun. From this example, it becomes clear that speech perception is a searching process the brain performs in a fraction of a moment trying to find the appropriate match of the heard utterance given a large knowledge base constituted from (language sounds + words dictionary + language grammar) From here, the comes up the structure of a speech recognition engine.
  5. Read words: كلام مقروء (الكلام منتظر و متوقع قبل نطقه ) Spontaneous: كلام عفوي غير متوقع Speaker-dependent: the engine needs to build a special profile for every user and be trained on its voice and way of speaking before being able to run properly and give acceptable results Finite-state language model: جمل قليلة محدودة النطاق مثل نمر تليفونات على سبيل المثال Context-sensitive language model: غير محدود النكاق ومعتمد على سياق الكلام needs a complicated NLP system.
  6. Phonology answers the question: what is the sounds existing in this language? Phonetics answers the question: what is the properties of these sounds? phonetics, study of the sounds of languages from three basic points of view according to their production in the vocal organs their physical properties (acoustic phonetics), their effect on the ear
  7. When a child starts learning, when he sees a dog and asks you what is this; you tell him it’s a dog; after that when he sees a donkey or cat he point to it and says it is a dog; you tell him no this is a donkey and this is cat; after that he points to your cat and says it’s a cat; you tell him, no it’s not just a cat, it is my cat, its name is Poosy. This is the idea of a model . Firstly the child made a model in his mind for any animal (a 4 legs creature) as a dog. Then he narrowed his model to dogs, donkeys and cats; then he narrowed it again to Poosy cat. The same idea applies for a mathematical model. Depending on your system size and nature you choose how to take your models. If your system is just recognizes on of only three sentences; you might make just an HMM for each sentence. If the system searches in a dictionary on 10 words, make an HMM for each word. If it searches in combinations of words in different orders, narrow you model to the level of sub-words, tri-phones, mono-phones, or even allophones, according to the system size and the appropriate search tree size and depth the system can bear. You have to note that, number of states in your model is a function of the model size you choose (i.e a function of the feature vector or in other meaning a fucntion of the time length of the unit of utterance you build model for {ranging usually from a whole word to a sub-phone})