SlideShare a Scribd company logo
1 of 20
Digital Speech Processing

 Dept. of Computer Science & Engineering
 Shahjalal University of Science & Technology
Course Description
– Review of digital signal processing

– Fundamentals of speech production and perception

– Basic techniques for digital speech processing:
   • short - time energy, magnitude, autocorrelation
   • short - time Fourier analysis
   • homomorphic methods
   • linear predictive methods
Course Description…
– Speech estimation methods
    • speech/non-speech detection
    • voiced/unvoiced/non-speech segmentation/classification
    • pitch detection
    • formant estimation
– Applications of speech signal processing
    • Speech coding
    • Speech synthesis
    • Speech recognition/natural language processing
Book Information
Textbook:
L. R. Rabiner and R. W. Schafer,
-Theory and Applications of Digital Speech
Processing, Prentice-Hall Inc., 2011

Recommended Supplementary Textbook:
• T. F. Quatieri, Principles of Discrete - Time Speech
Processing, Prentice Hall Inc, 2002
Laboratory works: Using Matlab
Speech Processing
Speech is one of the most intriguing signals that humans work
  with every day.

• Purpose of speech processing:
– To understand speech as a means of communication;
– To represent speech for transmission and reproduction;
– To analyze speech for automatic recognition and extraction of
   information
– To discover some physiological characteristics of the talker.
The Speech Stack

• Speech Applications —
  coding, synthesis, recognition, understanding, ve
  rification, language translation, speed-up/slow-
  down
• Speech Algorithms —speech-silence
  (background), voiced-unvoiced decision, pitch
  detection, formant estimation
• Speech Representations —
  temporal, spectral, homomorphic, LPC
• Fundamentals —
  acoustics, linguistics, pragmatics, speech
  perception
Speech Coding
Speech Coding

Speech Coding is the process of transforming a speech
  signal into a representation for efficient transmission
  and storage of speech
   – narrowband and broadband wired telephony
   – cellular communications
   – Voice over IP (VoIP) to utilize the Internet as a real-time
      communications medium
   – secure voice for privacy and encryption for national
      security applications
   – extremely narrowband communications
      channels, e.g., battlefield applications using HF radio
   – storage of speech for telephone answering
      machines, prerecorded messages
Speech Synthesis
Synthesis of Speech is the process of generating a speech
signal using computational means for effective human-
machine interactions.
Speech Synthesis…
– machine reading of text or email messages
– talking agents for automatic transactions
– automatic agent in customer care call center
– handheld devices such as foreign language
  phrasebooks, dictionaries, crossword puzzle
  helpers
– announcement machines that provide information
  such as stock quotes, airlines schedules, weather
  reports, etc.
Speech Synthesis Examples




    Natural     Synthetic
Speech Recognition and understanding

Recognition and Understanding of Speech is the process of
  extracting usable linguistic information from a speech signal
  in support of human-machine communication by voice
   – command and control (C&C) applications, e.g., simple commands for
      spreadsheets, presentation graphics, Appliances
   – voice dictation to create letters, memos, and other documents
   – natural language voice dialogues with machines to enable Help
      desks, Call Centers
   – voice dialing for cellphones and from PDA’s and other small devices
   – agent services such as calendar entry and update, address list
      modification and entry, etc.
Speech Recognition Demos
Pattern Matching Problems
Pattern Matching Problems
• Speech recognition
• Speaker recognition
• Speaker verification
• Word spotting
• Automatic indexing of speech recordings
Other Speech Applications
• Speaker Verification for secure access to
   premises, information, virtual spaces
• Speaker Recognition for legal and forensic purposes—national
   security; also for personalized services
• Speech Enhancement for use in noisy environments, to eliminate
   echo, to align voices with video segments, to change voice
   qualities, to speed-up or slow-down prerecorded speech
   (e.g., talking books, rapid review of material, careful scrutinizing of
   spoken material, etc) => potentially to improve intelligibility and
   naturalness of speech
• Language Translation to convert spoken words in one language to
   another to facilitate natural language dialogues between people
   speaking different languages, i.e., tourists, business people
Speech/DSP Enabled Devices
Digital Speech Processing
• DSP:
– obtaining discrete representations of speech signal
– theory, design and implementation of numerical procedures
(algorithms) for processing the discrete representation in order to
achieve a goal (recognizing the signal, modifying the time scale
of the signal, removing background noise from the signal, etc.)
•Why DSP
– reliability
– flexibility
– accuracy
– real-time implementations on inexpensive dsp chips
– ability to integrate with multimedia and data
– encryptability/security of the data and the data representations
via suitable techniques
What We Will Be Learning
• Review some basic DSP concepts
• Speech production model—acoustics, articulatory concepts,
   speech production models
• Speech perception model—ear models, auditory signal
   processing, equivalent acoustic processing models
• Time domain processing concepts—speech properties, pitch,
   voiced-unvoiced, energy, autocorrelation, zero-crossing rates
• Short time Fourier analysis methods—digital filter banks,
   spectrograms, analysis-synthesis systems, vocoders
• Homomorphic speech processing—cepstrum, pitch detection,
   formant estimation, homomorphic vocoder
What We Will Be Learning…
• Linear predictive coding methods—autocorrelation
   method, covariance method, lattice methods, relation to vocal
   tract models
• Speech waveform coding and source models—delta
   modulation, PCM, mu-law, ADPCM, vector
   quantization, multipulse coding, CELP coding
• Methods for speech synthesis and text-to-speech systems—
   physical models, formant models, articulatory
   models, concatenative models
• Methods for speech recognition—the Hidden Markov Model
   (HMM)

More Related Content

What's hot

Digital signal processing
Digital signal processingDigital signal processing
Digital signal processingvanikeerthika
 
Diversity Techniques in Wireless Communication
Diversity Techniques in Wireless CommunicationDiversity Techniques in Wireless Communication
Diversity Techniques in Wireless CommunicationSahar Foroughi
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCCHira Shaukat
 
Speech coding techniques
Speech coding techniquesSpeech coding techniques
Speech coding techniquesHemaraja Nayaka S
 
Introduction to Digital Signal Processing
Introduction to Digital Signal ProcessingIntroduction to Digital Signal Processing
Introduction to Digital Signal Processingop205
 
Antennas slideshare part 1
Antennas slideshare part 1Antennas slideshare part 1
Antennas slideshare part 1Muthumanickam
 
Cognitive radio networks
Cognitive radio networksCognitive radio networks
Cognitive radio networksAmeer Sameer
 
Link power and rise time budget analysis
Link power and rise time budget analysisLink power and rise time budget analysis
Link power and rise time budget analysisCKSunith1
 
2.5 capacity calculations of fdma, tdma and cdma
2.5   capacity calculations of fdma, tdma and cdma2.5   capacity calculations of fdma, tdma and cdma
2.5 capacity calculations of fdma, tdma and cdmaJAIGANESH SEKAR
 
Unit 5 next generation networks
Unit 5   next generation networksUnit 5   next generation networks
Unit 5 next generation networksJAIGANESH SEKAR
 
Diversity Techniques in mobile communications
Diversity Techniques in mobile communicationsDiversity Techniques in mobile communications
Diversity Techniques in mobile communicationsDiwaker Pant
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition Goa App
 
Introduction to DSP.ppt
Introduction to DSP.pptIntroduction to DSP.ppt
Introduction to DSP.pptDr.YNM
 
DIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSINGDIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSINGSnehal Hedau
 
Coherent and Non-coherent detection of ASK, FSK AND QASK
Coherent and Non-coherent detection of ASK, FSK AND QASKCoherent and Non-coherent detection of ASK, FSK AND QASK
Coherent and Non-coherent detection of ASK, FSK AND QASKnaimish12
 
Final Wireless communication PPT
Final Wireless communication PPTFinal Wireless communication PPT
Final Wireless communication PPTMelkamu Deressa
 
DIGITAL COMMUNICATION: ENCODING AND DECODING OF CYCLIC CODE
DIGITAL COMMUNICATION: ENCODING AND DECODING OF CYCLIC CODE DIGITAL COMMUNICATION: ENCODING AND DECODING OF CYCLIC CODE
DIGITAL COMMUNICATION: ENCODING AND DECODING OF CYCLIC CODE ShivangiSingh241
 
Adaptive filter
Adaptive filterAdaptive filter
Adaptive filterA. Shamel
 
Lecture5 teletraffic
Lecture5 teletrafficLecture5 teletraffic
Lecture5 teletrafficmazlina1202
 

What's hot (20)

Digital signal processing
Digital signal processingDigital signal processing
Digital signal processing
 
Diversity Techniques in Wireless Communication
Diversity Techniques in Wireless CommunicationDiversity Techniques in Wireless Communication
Diversity Techniques in Wireless Communication
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCC
 
Speech coding techniques
Speech coding techniquesSpeech coding techniques
Speech coding techniques
 
Introduction to Digital Signal Processing
Introduction to Digital Signal ProcessingIntroduction to Digital Signal Processing
Introduction to Digital Signal Processing
 
Antennas slideshare part 1
Antennas slideshare part 1Antennas slideshare part 1
Antennas slideshare part 1
 
Cognitive radio networks
Cognitive radio networksCognitive radio networks
Cognitive radio networks
 
Link power and rise time budget analysis
Link power and rise time budget analysisLink power and rise time budget analysis
Link power and rise time budget analysis
 
2.5 capacity calculations of fdma, tdma and cdma
2.5   capacity calculations of fdma, tdma and cdma2.5   capacity calculations of fdma, tdma and cdma
2.5 capacity calculations of fdma, tdma and cdma
 
Unit 5 next generation networks
Unit 5   next generation networksUnit 5   next generation networks
Unit 5 next generation networks
 
Diversity Techniques in mobile communications
Diversity Techniques in mobile communicationsDiversity Techniques in mobile communications
Diversity Techniques in mobile communications
 
Speech Recognition
Speech Recognition Speech Recognition
Speech Recognition
 
Introduction to DSP.ppt
Introduction to DSP.pptIntroduction to DSP.ppt
Introduction to DSP.ppt
 
DIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSINGDIGITAL SIGNAL PROCESSING
DIGITAL SIGNAL PROCESSING
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive Coding
 
Coherent and Non-coherent detection of ASK, FSK AND QASK
Coherent and Non-coherent detection of ASK, FSK AND QASKCoherent and Non-coherent detection of ASK, FSK AND QASK
Coherent and Non-coherent detection of ASK, FSK AND QASK
 
Final Wireless communication PPT
Final Wireless communication PPTFinal Wireless communication PPT
Final Wireless communication PPT
 
DIGITAL COMMUNICATION: ENCODING AND DECODING OF CYCLIC CODE
DIGITAL COMMUNICATION: ENCODING AND DECODING OF CYCLIC CODE DIGITAL COMMUNICATION: ENCODING AND DECODING OF CYCLIC CODE
DIGITAL COMMUNICATION: ENCODING AND DECODING OF CYCLIC CODE
 
Adaptive filter
Adaptive filterAdaptive filter
Adaptive filter
 
Lecture5 teletraffic
Lecture5 teletrafficLecture5 teletraffic
Lecture5 teletraffic
 

Viewers also liked

speech processing basics
speech processing basicsspeech processing basics
speech processing basicssivakumar m
 
Essential linguistics Chap 3 part 1 Graphic Organizer
Essential linguistics Chap 3 part 1 Graphic OrganizerEssential linguistics Chap 3 part 1 Graphic Organizer
Essential linguistics Chap 3 part 1 Graphic Organizersheilacook
 
Ppt on speech processing by ranbeer
Ppt on speech processing by ranbeerPpt on speech processing by ranbeer
Ppt on speech processing by ranbeerRanbeer Tyagi
 
Speech Signal Processing
Speech Signal ProcessingSpeech Signal Processing
Speech Signal ProcessingMurtadha Alsabbagh
 
Physiology of speech
Physiology of speechPhysiology of speech
Physiology of speechRaghu Veer
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizyLizy Abraham
 
Radio Communication
Radio CommunicationRadio Communication
Radio CommunicationJohn Grace
 
Radio communication presentation
Radio communication presentationRadio communication presentation
Radio communication presentationrandan88
 
Gsm.....ppt
Gsm.....pptGsm.....ppt
Gsm.....pptbalu008
 

Viewers also liked (12)

speech processing basics
speech processing basicsspeech processing basics
speech processing basics
 
Dif fft
Dif fftDif fft
Dif fft
 
Essential linguistics Chap 3 part 1 Graphic Organizer
Essential linguistics Chap 3 part 1 Graphic OrganizerEssential linguistics Chap 3 part 1 Graphic Organizer
Essential linguistics Chap 3 part 1 Graphic Organizer
 
Ppt on speech processing by ranbeer
Ppt on speech processing by ranbeerPpt on speech processing by ranbeer
Ppt on speech processing by ranbeer
 
Speech Signal Processing
Speech Signal ProcessingSpeech Signal Processing
Speech Signal Processing
 
Physiology of speech
Physiology of speechPhysiology of speech
Physiology of speech
 
Speech signal processing lizy
Speech signal processing lizySpeech signal processing lizy
Speech signal processing lizy
 
Radio Presentation
Radio PresentationRadio Presentation
Radio Presentation
 
Radio Communication
Radio CommunicationRadio Communication
Radio Communication
 
Radio communication presentation
Radio communication presentationRadio communication presentation
Radio communication presentation
 
Dsp ppt
Dsp pptDsp ppt
Dsp ppt
 
Gsm.....ppt
Gsm.....pptGsm.....ppt
Gsm.....ppt
 

Similar to Digital Speech Processing Course Overview

Speechrecognition 100423091251-phpapp01
Speechrecognition 100423091251-phpapp01Speechrecognition 100423091251-phpapp01
Speechrecognition 100423091251-phpapp01girishjoshi1234
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionZachary S. Brown
 
Voice recognition system
Voice recognition systemVoice recognition system
Voice recognition systemavinash raibole
 
Speech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceSpeech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceIlhaan Marwat
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition TechnologySrijanKumar18
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognitionCharu Joshi
 
Iitdmj 1
Iitdmj 1Iitdmj 1
Iitdmj 1Ram Yadav
 
Speech recognizers & generators
Speech recognizers & generatorsSpeech recognizers & generators
Speech recognizers & generatorsPaul Kahoro
 
Artificial Intelligence- An Introduction
Artificial Intelligence- An IntroductionArtificial Intelligence- An Introduction
Artificial Intelligence- An Introductionacemindia
 
Artificial Intelligence - An Introduction
Artificial Intelligence - An Introduction Artificial Intelligence - An Introduction
Artificial Intelligence - An Introduction acemindia
 
Speech-Recognition.pptx
Speech-Recognition.pptxSpeech-Recognition.pptx
Speech-Recognition.pptxJyothiMedisetty2
 
How speech reorganization works
How speech reorganization worksHow speech reorganization works
How speech reorganization worksMuhammad Taqi
 
WomenTech_Event
WomenTech_EventWomenTech_Event
WomenTech_EventThin Zar Phyo
 
Artificial intelligence - research areas
Artificial intelligence - research areasArtificial intelligence - research areas
Artificial intelligence - research areasLearnbay Datascience
 

Similar to Digital Speech Processing Course Overview (20)

Speechrecognition 100423091251-phpapp01
Speechrecognition 100423091251-phpapp01Speechrecognition 100423091251-phpapp01
Speechrecognition 100423091251-phpapp01
 
Speech recognition (dr. m. sabarimalai manikandan)
Speech recognition (dr. m. sabarimalai manikandan)Speech recognition (dr. m. sabarimalai manikandan)
Speech recognition (dr. m. sabarimalai manikandan)
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
 
Voice recognition system
Voice recognition systemVoice recognition system
Voice recognition system
 
Speech Recognition in Artificail Inteligence
Speech Recognition in Artificail InteligenceSpeech Recognition in Artificail Inteligence
Speech Recognition in Artificail Inteligence
 
Speech Recognition Technology
Speech Recognition TechnologySpeech Recognition Technology
Speech Recognition Technology
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 
Iitdmj 1
Iitdmj 1Iitdmj 1
Iitdmj 1
 
Speech recognizers & generators
Speech recognizers & generatorsSpeech recognizers & generators
Speech recognizers & generators
 
Artificial Intelligence- An Introduction
Artificial Intelligence- An IntroductionArtificial Intelligence- An Introduction
Artificial Intelligence- An Introduction
 
Artificial Intelligence - An Introduction
Artificial Intelligence - An Introduction Artificial Intelligence - An Introduction
Artificial Intelligence - An Introduction
 
Speech-Recognition.pptx
Speech-Recognition.pptxSpeech-Recognition.pptx
Speech-Recognition.pptx
 
Web AI.pptx
Web AI.pptxWeb AI.pptx
Web AI.pptx
 
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
[IJET-V1I6P21] Authors : Easwari.N , Ponmuthuramalingam.P
 
How speech reorganization works
How speech reorganization worksHow speech reorganization works
How speech reorganization works
 
Universal design HCI
Universal design HCIUniversal design HCI
Universal design HCI
 
WomenTech_Event
WomenTech_EventWomenTech_Event
WomenTech_Event
 
Artificial intelligence - research areas
Artificial intelligence - research areasArtificial intelligence - research areas
Artificial intelligence - research areas
 
Assign
AssignAssign
Assign
 
Amadou
AmadouAmadou
Amadou
 

Recently uploaded

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Recently uploaded (20)

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Digital Speech Processing Course Overview

  • 1. Digital Speech Processing Dept. of Computer Science & Engineering Shahjalal University of Science & Technology
  • 2. Course Description – Review of digital signal processing – Fundamentals of speech production and perception – Basic techniques for digital speech processing: • short - time energy, magnitude, autocorrelation • short - time Fourier analysis • homomorphic methods • linear predictive methods
  • 3. Course Description… – Speech estimation methods • speech/non-speech detection • voiced/unvoiced/non-speech segmentation/classification • pitch detection • formant estimation – Applications of speech signal processing • Speech coding • Speech synthesis • Speech recognition/natural language processing
  • 4. Book Information Textbook: L. R. Rabiner and R. W. Schafer, -Theory and Applications of Digital Speech Processing, Prentice-Hall Inc., 2011 Recommended Supplementary Textbook: • T. F. Quatieri, Principles of Discrete - Time Speech Processing, Prentice Hall Inc, 2002 Laboratory works: Using Matlab
  • 5. Speech Processing Speech is one of the most intriguing signals that humans work with every day. • Purpose of speech processing: – To understand speech as a means of communication; – To represent speech for transmission and reproduction; – To analyze speech for automatic recognition and extraction of information – To discover some physiological characteristics of the talker.
  • 6. The Speech Stack • Speech Applications — coding, synthesis, recognition, understanding, ve rification, language translation, speed-up/slow- down • Speech Algorithms —speech-silence (background), voiced-unvoiced decision, pitch detection, formant estimation • Speech Representations — temporal, spectral, homomorphic, LPC • Fundamentals — acoustics, linguistics, pragmatics, speech perception
  • 8. Speech Coding Speech Coding is the process of transforming a speech signal into a representation for efficient transmission and storage of speech – narrowband and broadband wired telephony – cellular communications – Voice over IP (VoIP) to utilize the Internet as a real-time communications medium – secure voice for privacy and encryption for national security applications – extremely narrowband communications channels, e.g., battlefield applications using HF radio – storage of speech for telephone answering machines, prerecorded messages
  • 9. Speech Synthesis Synthesis of Speech is the process of generating a speech signal using computational means for effective human- machine interactions.
  • 10. Speech Synthesis… – machine reading of text or email messages – talking agents for automatic transactions – automatic agent in customer care call center – handheld devices such as foreign language phrasebooks, dictionaries, crossword puzzle helpers – announcement machines that provide information such as stock quotes, airlines schedules, weather reports, etc.
  • 11. Speech Synthesis Examples Natural Synthetic
  • 12. Speech Recognition and understanding Recognition and Understanding of Speech is the process of extracting usable linguistic information from a speech signal in support of human-machine communication by voice – command and control (C&C) applications, e.g., simple commands for spreadsheets, presentation graphics, Appliances – voice dictation to create letters, memos, and other documents – natural language voice dialogues with machines to enable Help desks, Call Centers – voice dialing for cellphones and from PDA’s and other small devices – agent services such as calendar entry and update, address list modification and entry, etc.
  • 15. Pattern Matching Problems • Speech recognition • Speaker recognition • Speaker verification • Word spotting • Automatic indexing of speech recordings
  • 16. Other Speech Applications • Speaker Verification for secure access to premises, information, virtual spaces • Speaker Recognition for legal and forensic purposes—national security; also for personalized services • Speech Enhancement for use in noisy environments, to eliminate echo, to align voices with video segments, to change voice qualities, to speed-up or slow-down prerecorded speech (e.g., talking books, rapid review of material, careful scrutinizing of spoken material, etc) => potentially to improve intelligibility and naturalness of speech • Language Translation to convert spoken words in one language to another to facilitate natural language dialogues between people speaking different languages, i.e., tourists, business people
  • 18. Digital Speech Processing • DSP: – obtaining discrete representations of speech signal – theory, design and implementation of numerical procedures (algorithms) for processing the discrete representation in order to achieve a goal (recognizing the signal, modifying the time scale of the signal, removing background noise from the signal, etc.) •Why DSP – reliability – flexibility – accuracy – real-time implementations on inexpensive dsp chips – ability to integrate with multimedia and data – encryptability/security of the data and the data representations via suitable techniques
  • 19. What We Will Be Learning • Review some basic DSP concepts • Speech production model—acoustics, articulatory concepts, speech production models • Speech perception model—ear models, auditory signal processing, equivalent acoustic processing models • Time domain processing concepts—speech properties, pitch, voiced-unvoiced, energy, autocorrelation, zero-crossing rates • Short time Fourier analysis methods—digital filter banks, spectrograms, analysis-synthesis systems, vocoders • Homomorphic speech processing—cepstrum, pitch detection, formant estimation, homomorphic vocoder
  • 20. What We Will Be Learning… • Linear predictive coding methods—autocorrelation method, covariance method, lattice methods, relation to vocal tract models • Speech waveform coding and source models—delta modulation, PCM, mu-law, ADPCM, vector quantization, multipulse coding, CELP coding • Methods for speech synthesis and text-to-speech systems— physical models, formant models, articulatory models, concatenative models • Methods for speech recognition—the Hidden Markov Model (HMM)