SlideShare a Scribd company logo
1 of 25
Download to read offline
BASICS OF SPEECH PROCESSING
(VOCODER)
Ashish Maurya
(M.Tech 2nd year)
What is Speech?
• Speech is composed of phonemes, which are produced by the vocal cords and the vocal tract (which includes
the mouth and the lips).
• It is the ability to express the thoughts and feelings by vocalize sounds.
• Voiced Speech: Voiced signals are produced Unvoiced Speech: Unvoiced signals, on the other hand,
when the vocal cords vibrate during the tend to be more abrupt like the stop consonants
pronunciation of a phoneme. /p/, /t/, /k/
Voiced signals tend to be louder like the
vowels /a/, /e/, /i/, /o/, /u/.
• Speech can be further divided in Voiced and Unvoiced speech.
Human Speech Production System
When we speak:
• Air is pushed from lung through vocal tract and out of mouth comes speech.
• For certain voiced sound, vocal cords vibrate (open and close). The rate at
which the vocal cords vibrate determines the pitch of voice.
• Women and young children tend to have high pitch (fast vibration) while adult
males tend to have low pitch (slow vibration).
• For certain fricatives and plosive (or unvoiced) sound, vocal cords do not
vibrate but remain constantly opened.
• The shape of vocal tract determines the sound that you make.
• As we speak, our vocal tract changes its shape producing different sound.
• The shape of the vocal tract changes relatively slowly (on the scale of 10 msec
to 100 msec).
Mathematical Model of Human Speech System
Speech Coding
• It is a procedure to represent a digitized speech signal using as few bits as possible while
maintaining the speech quality.
• It is the process to convert speech signal with higher bit rate to lower bit rate.
• It is a speech compression process.
Speech codec in mobile phone technology
Basic purpose of speech coding techniques
Speech Coding Performance Attributes
• The primary requirements of the speech coders are:
• Low bit-rate: Less bandwidth is required for transmission, leading to a more cost-
efficient system.
• High speech quality: Good SNR and PESQ values.
• Other desirable requirements are:
• Robustness across different speakers / languages: Must support different speakers
(adult male, adult female, and children) and different languages.
• Robustness in the presence of channel errors
• Good Performance on non-speech signals such as telephone signaling tones
• Low memory size and low computational complexity
• Low coding delay
Speech Coder
• In speech codecs, speech is represented in the form of a code and the code is stored or transmitted.
• The implementation of a speech codec essentially means the implementation of a speech coder.
• The operation of the speech decoder depends on the method of coding employed in the speech coder.
Waveform Coders Analysis-by-
Synthesis Coders
Hybrid Coders
Voice Coders
(Vocoder)
Classification of speech coders
(Based on coding techniques)
• Waveform Coders:
• It preserve the original shape of the signal waveform.
• Better suited for higher bit -rate coders e.g. PCM, ADPCM.
• Voice Coders (Vocoders):
• Speech signal is assumed to be generated from a model which is controlled by some
parameters.
• During encoding, parameters of the model are estimated from the input speech signal.
• Then the parameters are transmitted as the encoded bit-stream.
• Quality of the decoded speech depends on model.
• Low bit rate coder.
• Example : G.729 Vocoder
• Hybrid Coders
• Hybrid coders combine the strength of Waveform coders and Vocoders.
• Additional parameters of the model are optimized such that the decoded speech is as close as
possible to the original waveform.
• Medium bit rate coder.
• Analysis-by-Synthesis Coders
• Improved form of vocoders
• Synthesized signal are extracted from the given codebook structure.
• Find the best perceptual match to the original speech by comparing each synthesized signal to the
original one with minimum error.
• The parameters representing the best excitation signal and corresponding production filters are then
send over to the decoder.
• Example: G.729 ACELP, CS-ACELP Speech coders
Applications
• Mobile VoIP
• Audio and video conferencing
• VoIP services
• WIFI phones VoWLAN
• Wireless GPRS EDGE systems
• Wideband IP telephony
• Transcoding /transcode between Vocoders
• Fax over IP/Fax Relay
Speech Coder Modules
ITU-T G-series
• G.711 – 64kbps PCM (A-law or μ-law form)
• G.722 – 7 kHz audio coding within 64 kbit/s (SB-ADPCM)
• G.722.1 – 24kbps and 32kbps, 7 kHz audio
• G.722.2 – Adaptive Multi-Rate Wideband (GSM AMR-WB)
• G.723.1 – 5.3kbps and 6.4kbps ACELP/MP-MLQ
• G.726 – 16kbps, 24kbps, 32kbps and 40kbps ADPCM
• G.727 – 5, 4, 3 and 2-bits sample Embedded ADPCM
• G.728 – 16kbps LD-CELP
• G.729/G.729A – 8kbps CS-ACELP
• G.729 Annex B – Silence Detection
• GSM Speech Coders fully comply with ETSI standards
• GSM-FR – GSM 06.10 Full Rate Vocoder
• GSM-HR – GSM 06.20 Half Rate Vocoder
• GSM-EFR – GSM 06.60 Enhanced Full Rate Vocoder
• GSM-AMR – GSM 06.90 Adaptive Multi-Rate Vocoder
• GSM-AMR-WB – 3GPP TS 26.171 Adaptive Multi-Rate Wideband (ITU G.722.2)
• Wideband Speech Coders with support for HD audio
• G.722 – 7 kHz audio coding within 64 kbit/s (SB-ADPCM)
• G.722.1 – 24kbps and 32kbps, 7 kHz audio
• G.722.2 – 6.6kbps to 23.85kbps, 7 kHz audio (GSM AMR-WB)
• Speex – 8 kHz, 16 kHz, and 32 kHz CELP
• SILK – Variable Bitrate Wideband Speech Codec
• iLBC – Internet Low Bitrate Codec
• LPC-10 – LPC-10
• MELP – Mixed-Excitation Linear Predictive
• MELPe – Mixed-Excitation Linear Predictive Enhanced
• Speex – 8 kHz, 16 kHz, and 32 kHz CELP
• SILK – Variable Bitrate Wideband Speech Codec
• Opus – Interactive Audio Codec
• Voice Activity Detection (VAD)
• Packet Loss Concealment (PLC)
• Adaptive Jitter
• Voice/Facsimile/Data Modem Detection
Comparison of few speech coders w.r.t. their bit-rate, quality and delay
G.729 Speech Codec
• Input speech: 8 KHz frequency and 16-bit PCM signal, so 8 x 16 = 128 Kbps.
• Uses input frames of 10ms which is equal to 80 samples for analysis.
• Each frame has two subframes of 5ms of 40 samples.
• Encoding Rate: 80 bits per 10 msec = 8Kbps.
• Output speech: Encoded speech decoded to 128 Kbps.
G.729 Speech codec
Speech Compression in G.729
• Telecommunication applications has frequency contents ranging from 300Hz - 3.4 kHz (approx.
4KHz) .
• Nyquist theorem states that the sampling frequency must be at least twice the bandwidth of
continuous-time signal to avoid aliasing.
• Therefore, standard sampling frequency (fs) for speech signals is
fs = 2*4kHz = 8kHz
• If the number of bits to represent each sample is 16.
• Bit-rate = 8 kHz *16 bits = 128 kbps
• This Bit-rate is very high and speech encoder reduces this bitrate to 8 kbps.
Gain
Quantization
Perceptual
Weighting
Pre-processing
LP Analysis
Quantization
Interpolation
Synthesis
Filter
Adaptive
Codebook
[14 Bits]
Fixed Codebook
[34 Bits]
Gc
Gp
Fixed CB
Search
Pitch
Analysis
Parameter Encoding
Input Speech
(16 Bit PCM Signal)
LPC Information
LPC Information
Adaptive
Codevector (40 Samples)
Fixed Codebook
Vector (40 Samples)
Codebook Gain (14 Bits)
LPC Information
ENCODER
 
DECODER
Bit-stream
Segmentation
Adaptive
Codebook
[14 Bits]
Fixed Codebook
[34 Bits]
Gc
Gp
Short-term Filter
Post-processing
Output Speech
Encoded Bit

GP: Adaptive Codebook Gain
GC: Fixed Codebook Gain
Stream
How Does G.729 Work ?
• These operations are performed once per frame:
1.Pre processing: Scales down the signal by a
factor of 2 , passing from a HP filter.
2.LP (linear prediction) Analysis: Uses linear
prediction to model the signal, the LP
coefficients are converted to LPC coefficients for
less sensitivity to quantization noise.
How Does G.729 Work ? (contd.)
3. Quantization: LPCs are quantized and used
throughout the rest of the algorithm.
4. Open-loop Pitch Analysis: Pitch analysis is
complex, this part gives us a rough estimation
of the pitch.
How Does G.729 Work ? (contd.)
• These operations are performed twice a frame or
once per subframe:
1.Adaptive Codebook Search: Determines exact
pitch delay through a closed loop.
2.Fixed Codebook Search: Unvoiced speech
analysis by searching the best codevector from
codebook.
G.729 Based Speech Codec Process Flow
ENCODER PROCESS FLOW DECODER PROCESS FLOW
Figure: G.729 based speech codec process
Specifications of G.729 Speech Codec:
• Speech coding algorithm: Conjugate-structure algebraic code-excited linear prediction (CS- ACELP);
• Speech sample stream: 16-bit PCM;
• Sampling rate: 8000 samples per second;
• Codec operates on: 10ms speech frames corresponding to 80 samples;
• Input speech: 128 Kbps ;
• Speech compression rate: 8 Kbps;
• Compression ratio: [16:1];
• Arithmetic operations: Fixed-point;
• Algorithmic delay: 10ms per frame and 5ms look-ahead delay;
• Linear-prediction filter: 10th order;
• Windowing: Hamming window and quarter of cosine function;
• Conversion of Auto-correlated coefficient to LP coefficients: Levinson-Durbin Algorithm;
• Adaptive codebook search for pitch-delay: Correlation method and Interpolating filter;
• Adaptive codebook size: 14 bits per frame (8 bits in 1st subframe and 5 bits in 2nd subframe);
• Fixed codebook structure: Algebraic codebook structure using an interleaved single-pulse permutation
(ISSP) design;
• Fixed codebook search Approach: Focused and Depth-first tree search;
• Fixed codebook size: 34 bits per frame (17 bits per subframe);
• Excitation parameter determination: 5ms per sub-frame in adaptive and fixed codebook search;
• Output speech: 128 Kbps ;

More Related Content

What's hot

cellular concepts in wireless communication
cellular concepts in wireless communicationcellular concepts in wireless communication
cellular concepts in wireless communication
asadkhan1327
 
Parameters of multipath channel
Parameters of multipath channelParameters of multipath channel
Parameters of multipath channel
Naveen Kumar
 

What's hot (20)

Quadrature amplitude modulation
Quadrature amplitude modulationQuadrature amplitude modulation
Quadrature amplitude modulation
 
WCDMA
WCDMAWCDMA
WCDMA
 
Ofdm
OfdmOfdm
Ofdm
 
Speech coding techniques
Speech coding techniquesSpeech coding techniques
Speech coding techniques
 
PULSE CODE MODULATION (PCM)
PULSE CODE MODULATION (PCM)PULSE CODE MODULATION (PCM)
PULSE CODE MODULATION (PCM)
 
Digital communication system
Digital communication systemDigital communication system
Digital communication system
 
4.4 diversity combining techniques
4.4   diversity combining techniques4.4   diversity combining techniques
4.4 diversity combining techniques
 
cellular concepts in wireless communication
cellular concepts in wireless communicationcellular concepts in wireless communication
cellular concepts in wireless communication
 
Indoor propagation model (IPM)
Indoor propagation model (IPM)Indoor propagation model (IPM)
Indoor propagation model (IPM)
 
Coherent and Non-coherent detection of ASK, FSK AND QASK
Coherent and Non-coherent detection of ASK, FSK AND QASKCoherent and Non-coherent detection of ASK, FSK AND QASK
Coherent and Non-coherent detection of ASK, FSK AND QASK
 
Equalization
EqualizationEqualization
Equalization
 
cell splitting and sectoring
cell splitting and sectoringcell splitting and sectoring
cell splitting and sectoring
 
Parameters of multipath channel
Parameters of multipath channelParameters of multipath channel
Parameters of multipath channel
 
Mimo in Wireless Communication
Mimo in Wireless CommunicationMimo in Wireless Communication
Mimo in Wireless Communication
 
Wireless Local Loop
Wireless Local LoopWireless Local Loop
Wireless Local Loop
 
2. wireless propagation models free space propagation
2. wireless propagation models   free space propagation2. wireless propagation models   free space propagation
2. wireless propagation models free space propagation
 
Multiple access techniques
Multiple access techniquesMultiple access techniques
Multiple access techniques
 
Satellite link design
Satellite link designSatellite link design
Satellite link design
 
Orthogonal Frequency Division Multiplexing (OFDM)
Orthogonal Frequency Division Multiplexing (OFDM)Orthogonal Frequency Division Multiplexing (OFDM)
Orthogonal Frequency Division Multiplexing (OFDM)
 
Propagation mechanisms
Propagation mechanismsPropagation mechanisms
Propagation mechanisms
 

Similar to SPEECH CODING

Audio and video compression
Audio and video compressionAudio and video compression
Audio and video compression
neeraj9217
 
Digitization of Audio.ppt
Digitization of Audio.pptDigitization of Audio.ppt
Digitization of Audio.ppt
Videoguy
 

Similar to SPEECH CODING (20)

Multimedia Compression and Communication
Multimedia Compression and CommunicationMultimedia Compression and Communication
Multimedia Compression and Communication
 
adaptive multirate speech coding
adaptive multirate speech codingadaptive multirate speech coding
adaptive multirate speech coding
 
Multimedia seminar ppt
Multimedia seminar pptMultimedia seminar ppt
Multimedia seminar ppt
 
Ijetr021253
Ijetr021253Ijetr021253
Ijetr021253
 
Basics of audio coding
Basics of audio codingBasics of audio coding
Basics of audio coding
 
Audio and video compression
Audio and video compressionAudio and video compression
Audio and video compression
 
Digital audio
Digital audioDigital audio
Digital audio
 
Harmonic speech coding
Harmonic speech codingHarmonic speech coding
Harmonic speech coding
 
add9.5.ppt
add9.5.pptadd9.5.ppt
add9.5.ppt
 
Multimedia Services: Audio
Multimedia Services: AudioMultimedia Services: Audio
Multimedia Services: Audio
 
Speech coding standards2
Speech coding standards2Speech coding standards2
Speech coding standards2
 
An audio quality evaluation of digital radio system
An audio quality evaluation of digital radio systemAn audio quality evaluation of digital radio system
An audio quality evaluation of digital radio system
 
An audio quality evaluation of digital radio system
An audio quality evaluation of digital radio systemAn audio quality evaluation of digital radio system
An audio quality evaluation of digital radio system
 
Mk3422222228
Mk3422222228Mk3422222228
Mk3422222228
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive Coding
 
Audio Compression_2023.pptx
Audio Compression_2023.pptxAudio Compression_2023.pptx
Audio Compression_2023.pptx
 
Interactive Voice Con
Interactive Voice ConInteractive Voice Con
Interactive Voice Con
 
Multimedia Systems by Sahil Punni
Multimedia Systems by Sahil PunniMultimedia Systems by Sahil Punni
Multimedia Systems by Sahil Punni
 
Digitization of Audio.ppt
Digitization of Audio.pptDigitization of Audio.ppt
Digitization of Audio.ppt
 
Digital audio
Digital audioDigital audio
Digital audio
 

Recently uploaded

Query optimization and processing for advanced database systems
Query optimization and processing for advanced database systemsQuery optimization and processing for advanced database systems
Query optimization and processing for advanced database systems
meharikiros2
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 

Recently uploaded (20)

Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Query optimization and processing for advanced database systems
Query optimization and processing for advanced database systemsQuery optimization and processing for advanced database systems
Query optimization and processing for advanced database systems
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Memory Interfacing of 8086 with DMA 8257
Memory Interfacing of 8086 with DMA 8257Memory Interfacing of 8086 with DMA 8257
Memory Interfacing of 8086 with DMA 8257
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 
Post office management system project ..pdf
Post office management system project ..pdfPost office management system project ..pdf
Post office management system project ..pdf
 
👉 Yavatmal Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Girl S...
👉 Yavatmal Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Girl S...👉 Yavatmal Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Girl S...
👉 Yavatmal Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top Class Call Girl S...
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
 
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptx
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...
 
Signal Processing and Linear System Analysis
Signal Processing and Linear System AnalysisSignal Processing and Linear System Analysis
Signal Processing and Linear System Analysis
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
fitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptfitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .ppt
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 

SPEECH CODING

  • 1. BASICS OF SPEECH PROCESSING (VOCODER) Ashish Maurya (M.Tech 2nd year)
  • 2. What is Speech? • Speech is composed of phonemes, which are produced by the vocal cords and the vocal tract (which includes the mouth and the lips). • It is the ability to express the thoughts and feelings by vocalize sounds.
  • 3. • Voiced Speech: Voiced signals are produced Unvoiced Speech: Unvoiced signals, on the other hand, when the vocal cords vibrate during the tend to be more abrupt like the stop consonants pronunciation of a phoneme. /p/, /t/, /k/ Voiced signals tend to be louder like the vowels /a/, /e/, /i/, /o/, /u/. • Speech can be further divided in Voiced and Unvoiced speech.
  • 4. Human Speech Production System When we speak: • Air is pushed from lung through vocal tract and out of mouth comes speech. • For certain voiced sound, vocal cords vibrate (open and close). The rate at which the vocal cords vibrate determines the pitch of voice. • Women and young children tend to have high pitch (fast vibration) while adult males tend to have low pitch (slow vibration). • For certain fricatives and plosive (or unvoiced) sound, vocal cords do not vibrate but remain constantly opened. • The shape of vocal tract determines the sound that you make. • As we speak, our vocal tract changes its shape producing different sound. • The shape of the vocal tract changes relatively slowly (on the scale of 10 msec to 100 msec).
  • 5. Mathematical Model of Human Speech System
  • 6. Speech Coding • It is a procedure to represent a digitized speech signal using as few bits as possible while maintaining the speech quality. • It is the process to convert speech signal with higher bit rate to lower bit rate. • It is a speech compression process. Speech codec in mobile phone technology
  • 7. Basic purpose of speech coding techniques
  • 8. Speech Coding Performance Attributes • The primary requirements of the speech coders are: • Low bit-rate: Less bandwidth is required for transmission, leading to a more cost- efficient system. • High speech quality: Good SNR and PESQ values. • Other desirable requirements are: • Robustness across different speakers / languages: Must support different speakers (adult male, adult female, and children) and different languages. • Robustness in the presence of channel errors • Good Performance on non-speech signals such as telephone signaling tones • Low memory size and low computational complexity • Low coding delay
  • 9. Speech Coder • In speech codecs, speech is represented in the form of a code and the code is stored or transmitted. • The implementation of a speech codec essentially means the implementation of a speech coder. • The operation of the speech decoder depends on the method of coding employed in the speech coder. Waveform Coders Analysis-by- Synthesis Coders Hybrid Coders Voice Coders (Vocoder) Classification of speech coders (Based on coding techniques)
  • 10. • Waveform Coders: • It preserve the original shape of the signal waveform. • Better suited for higher bit -rate coders e.g. PCM, ADPCM. • Voice Coders (Vocoders): • Speech signal is assumed to be generated from a model which is controlled by some parameters. • During encoding, parameters of the model are estimated from the input speech signal. • Then the parameters are transmitted as the encoded bit-stream. • Quality of the decoded speech depends on model. • Low bit rate coder. • Example : G.729 Vocoder
  • 11. • Hybrid Coders • Hybrid coders combine the strength of Waveform coders and Vocoders. • Additional parameters of the model are optimized such that the decoded speech is as close as possible to the original waveform. • Medium bit rate coder. • Analysis-by-Synthesis Coders • Improved form of vocoders • Synthesized signal are extracted from the given codebook structure. • Find the best perceptual match to the original speech by comparing each synthesized signal to the original one with minimum error. • The parameters representing the best excitation signal and corresponding production filters are then send over to the decoder. • Example: G.729 ACELP, CS-ACELP Speech coders
  • 12. Applications • Mobile VoIP • Audio and video conferencing • VoIP services • WIFI phones VoWLAN • Wireless GPRS EDGE systems • Wideband IP telephony • Transcoding /transcode between Vocoders • Fax over IP/Fax Relay
  • 13. Speech Coder Modules ITU-T G-series • G.711 – 64kbps PCM (A-law or μ-law form) • G.722 – 7 kHz audio coding within 64 kbit/s (SB-ADPCM) • G.722.1 – 24kbps and 32kbps, 7 kHz audio • G.722.2 – Adaptive Multi-Rate Wideband (GSM AMR-WB) • G.723.1 – 5.3kbps and 6.4kbps ACELP/MP-MLQ • G.726 – 16kbps, 24kbps, 32kbps and 40kbps ADPCM • G.727 – 5, 4, 3 and 2-bits sample Embedded ADPCM • G.728 – 16kbps LD-CELP • G.729/G.729A – 8kbps CS-ACELP • G.729 Annex B – Silence Detection
  • 14. • GSM Speech Coders fully comply with ETSI standards • GSM-FR – GSM 06.10 Full Rate Vocoder • GSM-HR – GSM 06.20 Half Rate Vocoder • GSM-EFR – GSM 06.60 Enhanced Full Rate Vocoder • GSM-AMR – GSM 06.90 Adaptive Multi-Rate Vocoder • GSM-AMR-WB – 3GPP TS 26.171 Adaptive Multi-Rate Wideband (ITU G.722.2) • Wideband Speech Coders with support for HD audio • G.722 – 7 kHz audio coding within 64 kbit/s (SB-ADPCM) • G.722.1 – 24kbps and 32kbps, 7 kHz audio • G.722.2 – 6.6kbps to 23.85kbps, 7 kHz audio (GSM AMR-WB) • Speex – 8 kHz, 16 kHz, and 32 kHz CELP • SILK – Variable Bitrate Wideband Speech Codec
  • 15. • iLBC – Internet Low Bitrate Codec • LPC-10 – LPC-10 • MELP – Mixed-Excitation Linear Predictive • MELPe – Mixed-Excitation Linear Predictive Enhanced • Speex – 8 kHz, 16 kHz, and 32 kHz CELP • SILK – Variable Bitrate Wideband Speech Codec • Opus – Interactive Audio Codec • Voice Activity Detection (VAD) • Packet Loss Concealment (PLC) • Adaptive Jitter • Voice/Facsimile/Data Modem Detection
  • 16. Comparison of few speech coders w.r.t. their bit-rate, quality and delay
  • 17. G.729 Speech Codec • Input speech: 8 KHz frequency and 16-bit PCM signal, so 8 x 16 = 128 Kbps. • Uses input frames of 10ms which is equal to 80 samples for analysis. • Each frame has two subframes of 5ms of 40 samples. • Encoding Rate: 80 bits per 10 msec = 8Kbps. • Output speech: Encoded speech decoded to 128 Kbps. G.729 Speech codec
  • 18. Speech Compression in G.729 • Telecommunication applications has frequency contents ranging from 300Hz - 3.4 kHz (approx. 4KHz) . • Nyquist theorem states that the sampling frequency must be at least twice the bandwidth of continuous-time signal to avoid aliasing. • Therefore, standard sampling frequency (fs) for speech signals is fs = 2*4kHz = 8kHz • If the number of bits to represent each sample is 16. • Bit-rate = 8 kHz *16 bits = 128 kbps • This Bit-rate is very high and speech encoder reduces this bitrate to 8 kbps.
  • 19. Gain Quantization Perceptual Weighting Pre-processing LP Analysis Quantization Interpolation Synthesis Filter Adaptive Codebook [14 Bits] Fixed Codebook [34 Bits] Gc Gp Fixed CB Search Pitch Analysis Parameter Encoding Input Speech (16 Bit PCM Signal) LPC Information LPC Information Adaptive Codevector (40 Samples) Fixed Codebook Vector (40 Samples) Codebook Gain (14 Bits) LPC Information ENCODER   DECODER Bit-stream Segmentation Adaptive Codebook [14 Bits] Fixed Codebook [34 Bits] Gc Gp Short-term Filter Post-processing Output Speech Encoded Bit  GP: Adaptive Codebook Gain GC: Fixed Codebook Gain Stream
  • 20. How Does G.729 Work ? • These operations are performed once per frame: 1.Pre processing: Scales down the signal by a factor of 2 , passing from a HP filter. 2.LP (linear prediction) Analysis: Uses linear prediction to model the signal, the LP coefficients are converted to LPC coefficients for less sensitivity to quantization noise.
  • 21. How Does G.729 Work ? (contd.) 3. Quantization: LPCs are quantized and used throughout the rest of the algorithm. 4. Open-loop Pitch Analysis: Pitch analysis is complex, this part gives us a rough estimation of the pitch.
  • 22. How Does G.729 Work ? (contd.) • These operations are performed twice a frame or once per subframe: 1.Adaptive Codebook Search: Determines exact pitch delay through a closed loop. 2.Fixed Codebook Search: Unvoiced speech analysis by searching the best codevector from codebook.
  • 23. G.729 Based Speech Codec Process Flow ENCODER PROCESS FLOW DECODER PROCESS FLOW Figure: G.729 based speech codec process
  • 24. Specifications of G.729 Speech Codec: • Speech coding algorithm: Conjugate-structure algebraic code-excited linear prediction (CS- ACELP); • Speech sample stream: 16-bit PCM; • Sampling rate: 8000 samples per second; • Codec operates on: 10ms speech frames corresponding to 80 samples; • Input speech: 128 Kbps ; • Speech compression rate: 8 Kbps; • Compression ratio: [16:1]; • Arithmetic operations: Fixed-point; • Algorithmic delay: 10ms per frame and 5ms look-ahead delay; • Linear-prediction filter: 10th order; • Windowing: Hamming window and quarter of cosine function; • Conversion of Auto-correlated coefficient to LP coefficients: Levinson-Durbin Algorithm;
  • 25. • Adaptive codebook search for pitch-delay: Correlation method and Interpolating filter; • Adaptive codebook size: 14 bits per frame (8 bits in 1st subframe and 5 bits in 2nd subframe); • Fixed codebook structure: Algebraic codebook structure using an interleaved single-pulse permutation (ISSP) design; • Fixed codebook search Approach: Focused and Depth-first tree search; • Fixed codebook size: 34 bits per frame (17 bits per subframe); • Excitation parameter determination: 5ms per sub-frame in adaptive and fixed codebook search; • Output speech: 128 Kbps ;