SlideShare a Scribd company logo
1 of 25
Download to read offline
BASICS OF SPEECH PROCESSING
(VOCODER)
Ashish Maurya
(M.Tech 2nd year)
What is Speech?
• Speech is composed of phonemes, which are produced by the vocal cords and the vocal tract (which includes
the mouth and the lips).
• It is the ability to express the thoughts and feelings by vocalize sounds.
• Voiced Speech: Voiced signals are produced Unvoiced Speech: Unvoiced signals, on the other hand,
when the vocal cords vibrate during the tend to be more abrupt like the stop consonants
pronunciation of a phoneme. /p/, /t/, /k/
Voiced signals tend to be louder like the
vowels /a/, /e/, /i/, /o/, /u/.
• Speech can be further divided in Voiced and Unvoiced speech.
Human Speech Production System
When we speak:
• Air is pushed from lung through vocal tract and out of mouth comes speech.
• For certain voiced sound, vocal cords vibrate (open and close). The rate at
which the vocal cords vibrate determines the pitch of voice.
• Women and young children tend to have high pitch (fast vibration) while adult
males tend to have low pitch (slow vibration).
• For certain fricatives and plosive (or unvoiced) sound, vocal cords do not
vibrate but remain constantly opened.
• The shape of vocal tract determines the sound that you make.
• As we speak, our vocal tract changes its shape producing different sound.
• The shape of the vocal tract changes relatively slowly (on the scale of 10 msec
to 100 msec).
Mathematical Model of Human Speech System
Speech Coding
• It is a procedure to represent a digitized speech signal using as few bits as possible while
maintaining the speech quality.
• It is the process to convert speech signal with higher bit rate to lower bit rate.
• It is a speech compression process.
Speech codec in mobile phone technology
Basic purpose of speech coding techniques
Speech Coding Performance Attributes
• The primary requirements of the speech coders are:
• Low bit-rate: Less bandwidth is required for transmission, leading to a more cost-
efficient system.
• High speech quality: Good SNR and PESQ values.
• Other desirable requirements are:
• Robustness across different speakers / languages: Must support different speakers
(adult male, adult female, and children) and different languages.
• Robustness in the presence of channel errors
• Good Performance on non-speech signals such as telephone signaling tones
• Low memory size and low computational complexity
• Low coding delay
Speech Coder
• In speech codecs, speech is represented in the form of a code and the code is stored or transmitted.
• The implementation of a speech codec essentially means the implementation of a speech coder.
• The operation of the speech decoder depends on the method of coding employed in the speech coder.
Waveform Coders Analysis-by-
Synthesis Coders
Hybrid Coders
Voice Coders
(Vocoder)
Classification of speech coders
(Based on coding techniques)
• Waveform Coders:
• It preserve the original shape of the signal waveform.
• Better suited for higher bit -rate coders e.g. PCM, ADPCM.
• Voice Coders (Vocoders):
• Speech signal is assumed to be generated from a model which is controlled by some
parameters.
• During encoding, parameters of the model are estimated from the input speech signal.
• Then the parameters are transmitted as the encoded bit-stream.
• Quality of the decoded speech depends on model.
• Low bit rate coder.
• Example : G.729 Vocoder
• Hybrid Coders
• Hybrid coders combine the strength of Waveform coders and Vocoders.
• Additional parameters of the model are optimized such that the decoded speech is as close as
possible to the original waveform.
• Medium bit rate coder.
• Analysis-by-Synthesis Coders
• Improved form of vocoders
• Synthesized signal are extracted from the given codebook structure.
• Find the best perceptual match to the original speech by comparing each synthesized signal to the
original one with minimum error.
• The parameters representing the best excitation signal and corresponding production filters are then
send over to the decoder.
• Example: G.729 ACELP, CS-ACELP Speech coders
Applications
• Mobile VoIP
• Audio and video conferencing
• VoIP services
• WIFI phones VoWLAN
• Wireless GPRS EDGE systems
• Wideband IP telephony
• Transcoding /transcode between Vocoders
• Fax over IP/Fax Relay
Speech Coder Modules
ITU-T G-series
• G.711 – 64kbps PCM (A-law or μ-law form)
• G.722 – 7 kHz audio coding within 64 kbit/s (SB-ADPCM)
• G.722.1 – 24kbps and 32kbps, 7 kHz audio
• G.722.2 – Adaptive Multi-Rate Wideband (GSM AMR-WB)
• G.723.1 – 5.3kbps and 6.4kbps ACELP/MP-MLQ
• G.726 – 16kbps, 24kbps, 32kbps and 40kbps ADPCM
• G.727 – 5, 4, 3 and 2-bits sample Embedded ADPCM
• G.728 – 16kbps LD-CELP
• G.729/G.729A – 8kbps CS-ACELP
• G.729 Annex B – Silence Detection
• GSM Speech Coders fully comply with ETSI standards
• GSM-FR – GSM 06.10 Full Rate Vocoder
• GSM-HR – GSM 06.20 Half Rate Vocoder
• GSM-EFR – GSM 06.60 Enhanced Full Rate Vocoder
• GSM-AMR – GSM 06.90 Adaptive Multi-Rate Vocoder
• GSM-AMR-WB – 3GPP TS 26.171 Adaptive Multi-Rate Wideband (ITU G.722.2)
• Wideband Speech Coders with support for HD audio
• G.722 – 7 kHz audio coding within 64 kbit/s (SB-ADPCM)
• G.722.1 – 24kbps and 32kbps, 7 kHz audio
• G.722.2 – 6.6kbps to 23.85kbps, 7 kHz audio (GSM AMR-WB)
• Speex – 8 kHz, 16 kHz, and 32 kHz CELP
• SILK – Variable Bitrate Wideband Speech Codec
• iLBC – Internet Low Bitrate Codec
• LPC-10 – LPC-10
• MELP – Mixed-Excitation Linear Predictive
• MELPe – Mixed-Excitation Linear Predictive Enhanced
• Speex – 8 kHz, 16 kHz, and 32 kHz CELP
• SILK – Variable Bitrate Wideband Speech Codec
• Opus – Interactive Audio Codec
• Voice Activity Detection (VAD)
• Packet Loss Concealment (PLC)
• Adaptive Jitter
• Voice/Facsimile/Data Modem Detection
Comparison of few speech coders w.r.t. their bit-rate, quality and delay
G.729 Speech Codec
• Input speech: 8 KHz frequency and 16-bit PCM signal, so 8 x 16 = 128 Kbps.
• Uses input frames of 10ms which is equal to 80 samples for analysis.
• Each frame has two subframes of 5ms of 40 samples.
• Encoding Rate: 80 bits per 10 msec = 8Kbps.
• Output speech: Encoded speech decoded to 128 Kbps.
G.729 Speech codec
Speech Compression in G.729
• Telecommunication applications has frequency contents ranging from 300Hz - 3.4 kHz (approx.
4KHz) .
• Nyquist theorem states that the sampling frequency must be at least twice the bandwidth of
continuous-time signal to avoid aliasing.
• Therefore, standard sampling frequency (fs) for speech signals is
fs = 2*4kHz = 8kHz
• If the number of bits to represent each sample is 16.
• Bit-rate = 8 kHz *16 bits = 128 kbps
• This Bit-rate is very high and speech encoder reduces this bitrate to 8 kbps.
Gain
Quantization
Perceptual
Weighting
Pre-processing
LP Analysis
Quantization
Interpolation
Synthesis
Filter
Adaptive
Codebook
[14 Bits]
Fixed Codebook
[34 Bits]
Gc
Gp
Fixed CB
Search
Pitch
Analysis
Parameter Encoding
Input Speech
(16 Bit PCM Signal)
LPC Information
LPC Information
Adaptive
Codevector (40 Samples)
Fixed Codebook
Vector (40 Samples)
Codebook Gain (14 Bits)
LPC Information
ENCODER
 
DECODER
Bit-stream
Segmentation
Adaptive
Codebook
[14 Bits]
Fixed Codebook
[34 Bits]
Gc
Gp
Short-term Filter
Post-processing
Output Speech
Encoded Bit

GP: Adaptive Codebook Gain
GC: Fixed Codebook Gain
Stream
How Does G.729 Work ?
• These operations are performed once per frame:
1.Pre processing: Scales down the signal by a
factor of 2 , passing from a HP filter.
2.LP (linear prediction) Analysis: Uses linear
prediction to model the signal, the LP
coefficients are converted to LPC coefficients for
less sensitivity to quantization noise.
How Does G.729 Work ? (contd.)
3. Quantization: LPCs are quantized and used
throughout the rest of the algorithm.
4. Open-loop Pitch Analysis: Pitch analysis is
complex, this part gives us a rough estimation
of the pitch.
How Does G.729 Work ? (contd.)
• These operations are performed twice a frame or
once per subframe:
1.Adaptive Codebook Search: Determines exact
pitch delay through a closed loop.
2.Fixed Codebook Search: Unvoiced speech
analysis by searching the best codevector from
codebook.
G.729 Based Speech Codec Process Flow
ENCODER PROCESS FLOW DECODER PROCESS FLOW
Figure: G.729 based speech codec process
Specifications of G.729 Speech Codec:
• Speech coding algorithm: Conjugate-structure algebraic code-excited linear prediction (CS- ACELP);
• Speech sample stream: 16-bit PCM;
• Sampling rate: 8000 samples per second;
• Codec operates on: 10ms speech frames corresponding to 80 samples;
• Input speech: 128 Kbps ;
• Speech compression rate: 8 Kbps;
• Compression ratio: [16:1];
• Arithmetic operations: Fixed-point;
• Algorithmic delay: 10ms per frame and 5ms look-ahead delay;
• Linear-prediction filter: 10th order;
• Windowing: Hamming window and quarter of cosine function;
• Conversion of Auto-correlated coefficient to LP coefficients: Levinson-Durbin Algorithm;
• Adaptive codebook search for pitch-delay: Correlation method and Interpolating filter;
• Adaptive codebook size: 14 bits per frame (8 bits in 1st subframe and 5 bits in 2nd subframe);
• Fixed codebook structure: Algebraic codebook structure using an interleaved single-pulse permutation
(ISSP) design;
• Fixed codebook search Approach: Focused and Depth-first tree search;
• Fixed codebook size: 34 bits per frame (17 bits per subframe);
• Excitation parameter determination: 5ms per sub-frame in adaptive and fixed codebook search;
• Output speech: 128 Kbps ;

More Related Content

What's hot

Concept of Diversity & Fading (wireless communication)
Concept of Diversity & Fading (wireless communication)Concept of Diversity & Fading (wireless communication)
Concept of Diversity & Fading (wireless communication)Omkar Rane
 
Pulse modulation
Pulse modulationPulse modulation
Pulse modulationstk_gpg
 
M ary psk and m ary qam ppt
M ary psk and m ary qam pptM ary psk and m ary qam ppt
M ary psk and m ary qam pptDANISHAMIN950
 
Pulse code modulation and Demodulation
Pulse code modulation and DemodulationPulse code modulation and Demodulation
Pulse code modulation and DemodulationAbdul Razaq
 
Ec 2401 wireless communication unit 3
Ec 2401 wireless communication   unit 3Ec 2401 wireless communication   unit 3
Ec 2401 wireless communication unit 3JAIGANESH SEKAR
 
4.4 diversity combining techniques
4.4   diversity combining techniques4.4   diversity combining techniques
4.4 diversity combining techniquesJAIGANESH SEKAR
 
Equalization
EqualizationEqualization
Equalizationbhabendu
 
10. types of small scale fading
10. types of small scale fading10. types of small scale fading
10. types of small scale fadingJAIGANESH SEKAR
 
Ec 2401 wireless communication unit 4
Ec 2401 wireless communication   unit 4Ec 2401 wireless communication   unit 4
Ec 2401 wireless communication unit 4JAIGANESH SEKAR
 
linear equalizer and turbo equalizer
linear equalizer and turbo equalizerlinear equalizer and turbo equalizer
linear equalizer and turbo equalizerDivya_mtech
 
Frequency hopping spread spectrum
Frequency hopping spread spectrumFrequency hopping spread spectrum
Frequency hopping spread spectrumHarshit Gupta
 
Adaptive linear equalizer
Adaptive linear equalizerAdaptive linear equalizer
Adaptive linear equalizerSophia Jeanne
 
Frequency shift keying report
Frequency shift keying reportFrequency shift keying report
Frequency shift keying reportestherleah21
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCCHira Shaukat
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By MatlabAnkit Gujrati
 

What's hot (20)

Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive Coding
 
Adaptive equalization
Adaptive equalizationAdaptive equalization
Adaptive equalization
 
Concept of Diversity & Fading (wireless communication)
Concept of Diversity & Fading (wireless communication)Concept of Diversity & Fading (wireless communication)
Concept of Diversity & Fading (wireless communication)
 
Pulse modulation
Pulse modulationPulse modulation
Pulse modulation
 
M ary psk and m ary qam ppt
M ary psk and m ary qam pptM ary psk and m ary qam ppt
M ary psk and m ary qam ppt
 
Pulse code modulation and Demodulation
Pulse code modulation and DemodulationPulse code modulation and Demodulation
Pulse code modulation and Demodulation
 
Ec 2401 wireless communication unit 3
Ec 2401 wireless communication   unit 3Ec 2401 wireless communication   unit 3
Ec 2401 wireless communication unit 3
 
4.4 diversity combining techniques
4.4   diversity combining techniques4.4   diversity combining techniques
4.4 diversity combining techniques
 
Equalization
EqualizationEqualization
Equalization
 
Audio compression
Audio compressionAudio compression
Audio compression
 
10. types of small scale fading
10. types of small scale fading10. types of small scale fading
10. types of small scale fading
 
Ec 2401 wireless communication unit 4
Ec 2401 wireless communication   unit 4Ec 2401 wireless communication   unit 4
Ec 2401 wireless communication unit 4
 
linear equalizer and turbo equalizer
linear equalizer and turbo equalizerlinear equalizer and turbo equalizer
linear equalizer and turbo equalizer
 
Frequency hopping spread spectrum
Frequency hopping spread spectrumFrequency hopping spread spectrum
Frequency hopping spread spectrum
 
Adaptive linear equalizer
Adaptive linear equalizerAdaptive linear equalizer
Adaptive linear equalizer
 
Frequency shift keying report
Frequency shift keying reportFrequency shift keying report
Frequency shift keying report
 
NON PARAMETRIC METHOD
NON PARAMETRIC METHODNON PARAMETRIC METHOD
NON PARAMETRIC METHOD
 
Speaker recognition using MFCC
Speaker recognition using MFCCSpeaker recognition using MFCC
Speaker recognition using MFCC
 
Speech Recognition System By Matlab
Speech Recognition System By MatlabSpeech Recognition System By Matlab
Speech Recognition System By Matlab
 
Frequency Modulation
Frequency ModulationFrequency Modulation
Frequency Modulation
 

Similar to SPEECH CODING

Multimedia Compression and Communication
Multimedia Compression and CommunicationMultimedia Compression and Communication
Multimedia Compression and CommunicationBenesh Selvanesan
 
adaptive multirate speech coding
adaptive multirate speech codingadaptive multirate speech coding
adaptive multirate speech codingAbhiram Subhagan
 
Multimedia seminar ppt
Multimedia seminar pptMultimedia seminar ppt
Multimedia seminar pptAnandi Kumari
 
Basics of audio coding
Basics of audio codingBasics of audio coding
Basics of audio codingsakshij91
 
Audio and video compression
Audio and video compressionAudio and video compression
Audio and video compressionneeraj9217
 
Speech coding standards2
Speech coding standards2Speech coding standards2
Speech coding standards2elroy25
 
An audio quality evaluation of digital radio system
An audio quality evaluation of digital radio systemAn audio quality evaluation of digital radio system
An audio quality evaluation of digital radio systemRojith Thomas
 
An audio quality evaluation of digital radio system
An audio quality evaluation of digital radio systemAn audio quality evaluation of digital radio system
An audio quality evaluation of digital radio systemRojith Thomas
 
Audio Compression_2023.pptx
Audio Compression_2023.pptxAudio Compression_2023.pptx
Audio Compression_2023.pptxzulhelmanz
 
Interactive Voice Con
Interactive Voice ConInteractive Voice Con
Interactive Voice ConDru Wynings
 
Multimedia Systems by Sahil Punni
Multimedia Systems by Sahil PunniMultimedia Systems by Sahil Punni
Multimedia Systems by Sahil PunniSahil Punni
 
Digitization of Audio.ppt
Digitization of Audio.pptDigitization of Audio.ppt
Digitization of Audio.pptVideoguy
 
Voice Interface - QX3440.ppt
Voice Interface - QX3440.pptVoice Interface - QX3440.ppt
Voice Interface - QX3440.pptAngelMoreno606965
 

Similar to SPEECH CODING (20)

Multimedia Compression and Communication
Multimedia Compression and CommunicationMultimedia Compression and Communication
Multimedia Compression and Communication
 
adaptive multirate speech coding
adaptive multirate speech codingadaptive multirate speech coding
adaptive multirate speech coding
 
Multimedia seminar ppt
Multimedia seminar pptMultimedia seminar ppt
Multimedia seminar ppt
 
Ijetr021253
Ijetr021253Ijetr021253
Ijetr021253
 
Basics of audio coding
Basics of audio codingBasics of audio coding
Basics of audio coding
 
Audio and video compression
Audio and video compressionAudio and video compression
Audio and video compression
 
Digital audio
Digital audioDigital audio
Digital audio
 
Harmonic speech coding
Harmonic speech codingHarmonic speech coding
Harmonic speech coding
 
add9.5.ppt
add9.5.pptadd9.5.ppt
add9.5.ppt
 
Multimedia Services: Audio
Multimedia Services: AudioMultimedia Services: Audio
Multimedia Services: Audio
 
Speech coding standards2
Speech coding standards2Speech coding standards2
Speech coding standards2
 
An audio quality evaluation of digital radio system
An audio quality evaluation of digital radio systemAn audio quality evaluation of digital radio system
An audio quality evaluation of digital radio system
 
An audio quality evaluation of digital radio system
An audio quality evaluation of digital radio systemAn audio quality evaluation of digital radio system
An audio quality evaluation of digital radio system
 
Mk3422222228
Mk3422222228Mk3422222228
Mk3422222228
 
Audio Compression_2023.pptx
Audio Compression_2023.pptxAudio Compression_2023.pptx
Audio Compression_2023.pptx
 
Interactive Voice Con
Interactive Voice ConInteractive Voice Con
Interactive Voice Con
 
Multimedia Systems by Sahil Punni
Multimedia Systems by Sahil PunniMultimedia Systems by Sahil Punni
Multimedia Systems by Sahil Punni
 
Digitization of Audio.ppt
Digitization of Audio.pptDigitization of Audio.ppt
Digitization of Audio.ppt
 
Digital audio
Digital audioDigital audio
Digital audio
 
Voice Interface - QX3440.ppt
Voice Interface - QX3440.pptVoice Interface - QX3440.ppt
Voice Interface - QX3440.ppt
 

Recently uploaded

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 

Recently uploaded (20)

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 

SPEECH CODING

  • 1. BASICS OF SPEECH PROCESSING (VOCODER) Ashish Maurya (M.Tech 2nd year)
  • 2. What is Speech? • Speech is composed of phonemes, which are produced by the vocal cords and the vocal tract (which includes the mouth and the lips). • It is the ability to express the thoughts and feelings by vocalize sounds.
  • 3. • Voiced Speech: Voiced signals are produced Unvoiced Speech: Unvoiced signals, on the other hand, when the vocal cords vibrate during the tend to be more abrupt like the stop consonants pronunciation of a phoneme. /p/, /t/, /k/ Voiced signals tend to be louder like the vowels /a/, /e/, /i/, /o/, /u/. • Speech can be further divided in Voiced and Unvoiced speech.
  • 4. Human Speech Production System When we speak: • Air is pushed from lung through vocal tract and out of mouth comes speech. • For certain voiced sound, vocal cords vibrate (open and close). The rate at which the vocal cords vibrate determines the pitch of voice. • Women and young children tend to have high pitch (fast vibration) while adult males tend to have low pitch (slow vibration). • For certain fricatives and plosive (or unvoiced) sound, vocal cords do not vibrate but remain constantly opened. • The shape of vocal tract determines the sound that you make. • As we speak, our vocal tract changes its shape producing different sound. • The shape of the vocal tract changes relatively slowly (on the scale of 10 msec to 100 msec).
  • 5. Mathematical Model of Human Speech System
  • 6. Speech Coding • It is a procedure to represent a digitized speech signal using as few bits as possible while maintaining the speech quality. • It is the process to convert speech signal with higher bit rate to lower bit rate. • It is a speech compression process. Speech codec in mobile phone technology
  • 7. Basic purpose of speech coding techniques
  • 8. Speech Coding Performance Attributes • The primary requirements of the speech coders are: • Low bit-rate: Less bandwidth is required for transmission, leading to a more cost- efficient system. • High speech quality: Good SNR and PESQ values. • Other desirable requirements are: • Robustness across different speakers / languages: Must support different speakers (adult male, adult female, and children) and different languages. • Robustness in the presence of channel errors • Good Performance on non-speech signals such as telephone signaling tones • Low memory size and low computational complexity • Low coding delay
  • 9. Speech Coder • In speech codecs, speech is represented in the form of a code and the code is stored or transmitted. • The implementation of a speech codec essentially means the implementation of a speech coder. • The operation of the speech decoder depends on the method of coding employed in the speech coder. Waveform Coders Analysis-by- Synthesis Coders Hybrid Coders Voice Coders (Vocoder) Classification of speech coders (Based on coding techniques)
  • 10. • Waveform Coders: • It preserve the original shape of the signal waveform. • Better suited for higher bit -rate coders e.g. PCM, ADPCM. • Voice Coders (Vocoders): • Speech signal is assumed to be generated from a model which is controlled by some parameters. • During encoding, parameters of the model are estimated from the input speech signal. • Then the parameters are transmitted as the encoded bit-stream. • Quality of the decoded speech depends on model. • Low bit rate coder. • Example : G.729 Vocoder
  • 11. • Hybrid Coders • Hybrid coders combine the strength of Waveform coders and Vocoders. • Additional parameters of the model are optimized such that the decoded speech is as close as possible to the original waveform. • Medium bit rate coder. • Analysis-by-Synthesis Coders • Improved form of vocoders • Synthesized signal are extracted from the given codebook structure. • Find the best perceptual match to the original speech by comparing each synthesized signal to the original one with minimum error. • The parameters representing the best excitation signal and corresponding production filters are then send over to the decoder. • Example: G.729 ACELP, CS-ACELP Speech coders
  • 12. Applications • Mobile VoIP • Audio and video conferencing • VoIP services • WIFI phones VoWLAN • Wireless GPRS EDGE systems • Wideband IP telephony • Transcoding /transcode between Vocoders • Fax over IP/Fax Relay
  • 13. Speech Coder Modules ITU-T G-series • G.711 – 64kbps PCM (A-law or μ-law form) • G.722 – 7 kHz audio coding within 64 kbit/s (SB-ADPCM) • G.722.1 – 24kbps and 32kbps, 7 kHz audio • G.722.2 – Adaptive Multi-Rate Wideband (GSM AMR-WB) • G.723.1 – 5.3kbps and 6.4kbps ACELP/MP-MLQ • G.726 – 16kbps, 24kbps, 32kbps and 40kbps ADPCM • G.727 – 5, 4, 3 and 2-bits sample Embedded ADPCM • G.728 – 16kbps LD-CELP • G.729/G.729A – 8kbps CS-ACELP • G.729 Annex B – Silence Detection
  • 14. • GSM Speech Coders fully comply with ETSI standards • GSM-FR – GSM 06.10 Full Rate Vocoder • GSM-HR – GSM 06.20 Half Rate Vocoder • GSM-EFR – GSM 06.60 Enhanced Full Rate Vocoder • GSM-AMR – GSM 06.90 Adaptive Multi-Rate Vocoder • GSM-AMR-WB – 3GPP TS 26.171 Adaptive Multi-Rate Wideband (ITU G.722.2) • Wideband Speech Coders with support for HD audio • G.722 – 7 kHz audio coding within 64 kbit/s (SB-ADPCM) • G.722.1 – 24kbps and 32kbps, 7 kHz audio • G.722.2 – 6.6kbps to 23.85kbps, 7 kHz audio (GSM AMR-WB) • Speex – 8 kHz, 16 kHz, and 32 kHz CELP • SILK – Variable Bitrate Wideband Speech Codec
  • 15. • iLBC – Internet Low Bitrate Codec • LPC-10 – LPC-10 • MELP – Mixed-Excitation Linear Predictive • MELPe – Mixed-Excitation Linear Predictive Enhanced • Speex – 8 kHz, 16 kHz, and 32 kHz CELP • SILK – Variable Bitrate Wideband Speech Codec • Opus – Interactive Audio Codec • Voice Activity Detection (VAD) • Packet Loss Concealment (PLC) • Adaptive Jitter • Voice/Facsimile/Data Modem Detection
  • 16. Comparison of few speech coders w.r.t. their bit-rate, quality and delay
  • 17. G.729 Speech Codec • Input speech: 8 KHz frequency and 16-bit PCM signal, so 8 x 16 = 128 Kbps. • Uses input frames of 10ms which is equal to 80 samples for analysis. • Each frame has two subframes of 5ms of 40 samples. • Encoding Rate: 80 bits per 10 msec = 8Kbps. • Output speech: Encoded speech decoded to 128 Kbps. G.729 Speech codec
  • 18. Speech Compression in G.729 • Telecommunication applications has frequency contents ranging from 300Hz - 3.4 kHz (approx. 4KHz) . • Nyquist theorem states that the sampling frequency must be at least twice the bandwidth of continuous-time signal to avoid aliasing. • Therefore, standard sampling frequency (fs) for speech signals is fs = 2*4kHz = 8kHz • If the number of bits to represent each sample is 16. • Bit-rate = 8 kHz *16 bits = 128 kbps • This Bit-rate is very high and speech encoder reduces this bitrate to 8 kbps.
  • 19. Gain Quantization Perceptual Weighting Pre-processing LP Analysis Quantization Interpolation Synthesis Filter Adaptive Codebook [14 Bits] Fixed Codebook [34 Bits] Gc Gp Fixed CB Search Pitch Analysis Parameter Encoding Input Speech (16 Bit PCM Signal) LPC Information LPC Information Adaptive Codevector (40 Samples) Fixed Codebook Vector (40 Samples) Codebook Gain (14 Bits) LPC Information ENCODER   DECODER Bit-stream Segmentation Adaptive Codebook [14 Bits] Fixed Codebook [34 Bits] Gc Gp Short-term Filter Post-processing Output Speech Encoded Bit  GP: Adaptive Codebook Gain GC: Fixed Codebook Gain Stream
  • 20. How Does G.729 Work ? • These operations are performed once per frame: 1.Pre processing: Scales down the signal by a factor of 2 , passing from a HP filter. 2.LP (linear prediction) Analysis: Uses linear prediction to model the signal, the LP coefficients are converted to LPC coefficients for less sensitivity to quantization noise.
  • 21. How Does G.729 Work ? (contd.) 3. Quantization: LPCs are quantized and used throughout the rest of the algorithm. 4. Open-loop Pitch Analysis: Pitch analysis is complex, this part gives us a rough estimation of the pitch.
  • 22. How Does G.729 Work ? (contd.) • These operations are performed twice a frame or once per subframe: 1.Adaptive Codebook Search: Determines exact pitch delay through a closed loop. 2.Fixed Codebook Search: Unvoiced speech analysis by searching the best codevector from codebook.
  • 23. G.729 Based Speech Codec Process Flow ENCODER PROCESS FLOW DECODER PROCESS FLOW Figure: G.729 based speech codec process
  • 24. Specifications of G.729 Speech Codec: • Speech coding algorithm: Conjugate-structure algebraic code-excited linear prediction (CS- ACELP); • Speech sample stream: 16-bit PCM; • Sampling rate: 8000 samples per second; • Codec operates on: 10ms speech frames corresponding to 80 samples; • Input speech: 128 Kbps ; • Speech compression rate: 8 Kbps; • Compression ratio: [16:1]; • Arithmetic operations: Fixed-point; • Algorithmic delay: 10ms per frame and 5ms look-ahead delay; • Linear-prediction filter: 10th order; • Windowing: Hamming window and quarter of cosine function; • Conversion of Auto-correlated coefficient to LP coefficients: Levinson-Durbin Algorithm;
  • 25. • Adaptive codebook search for pitch-delay: Correlation method and Interpolating filter; • Adaptive codebook size: 14 bits per frame (8 bits in 1st subframe and 5 bits in 2nd subframe); • Fixed codebook structure: Algebraic codebook structure using an interleaved single-pulse permutation (ISSP) design; • Fixed codebook search Approach: Focused and Depth-first tree search; • Fixed codebook size: 34 bits per frame (17 bits per subframe); • Excitation parameter determination: 5ms per sub-frame in adaptive and fixed codebook search; • Output speech: 128 Kbps ;