Speech Compression
using
GSM RPE-LTP
Faiza Nawaz
Bisma Hashmi
Mehrin Kiani
2
Introduction to GSM
 The Global System for Mobile Communications is the most
popular standard for mobile phones in the world.
 GSM service is used by over 2 billion people across more than
212 countries and territories.
 The ubiquity of the GSM standard makes international roaming
very common between mobile phone operators.
 GSM differs significantly from its predecessors in that both
signaling and speech channels are Digital call quality.
(so it is considered a second generation (2G) mobile phone
system.)
3
Architecture Of GSM
4
What is Speech?
 Speech Generation:
5
GSM 6.10 Vocoder
 Key principle: mathematical modeling of the human vocal tract,
leading to an efficient compression method for transmitting
speech.
 A vocoder (combination of voice and coder) is used to describe
GSM systems tailored for the compression of speech.
 The sampling rate is 8000 sample/s leading to an average bit
rate for the encoded bit stream of 13 K bit/s
6
GSM 6.10 Vocoder
 Coding scheme used by GSM 6.10 Vocoder is the Regular Pulse
Excitation - Long Term prediction - Linear Predictive Coder
(RPE-LTP)
 Vocoder sends three kinds of information to the receiver:
 Voiced or unvoiced signal
 (If it is voiced) The period of the excitation signal
 The parameters of the prediction filter.
7
Linear Predictive Coder (LPC)
 LPC algorithm assumes that each speech sample is a linear
combination of previous samples.
 Speech is sampled, stored and analyzed.
 Coefficients calculated from the sample are transmitted and
processed in the receiver.
 Receiver accurately processes and categorizes voiced and
unvoiced sounds.
8
Residual Pulse Excited (RPE) Coder
 Determines if the signal is voiced or unvoiced
 Determines the period for voiced sounds, encodes periodicity and
transmits the coefficient
 When the signal changes from voiced to unvoiced, RPE transmits a
code that stops the receiver from generating periodic pulses
 Starts generating random pulses to correspond to the noise like
nature of unvoiced
9
GSM Compression Technologies
 Four compression technologies are:
 Full Rate
 Enhanced Full Rate (EFR)
 Adaptive Multi-Rate (AMR)
 Half Rate
10
GSM Full Rate Vocoder Using RPE-LTP
 Described as an RPE-LTP linear predictive coder.
 Models the human vocal tract as a series of cylinders of
different widths.
 By forcing air through these cylinders, speech sounds
can be generated— the LPC coder models this with a
set of simultaneous equations.
11
GSM Full Rate Vocoder Using RPE-LTP
(…contd)
 The input data to the RPE-LTP coder is 20ms of speech
composed of 160 samples, each with 13bit resolution.
 The data is first passed through a pre-emphasis filter:
 Enhances high-frequency components of the signal. (better
transmission efficiency.)
 Also removes any offset on the signal. (Simplifies computation.)
12
LPC Speech Generation
 The model of speech generation can be thought of as air passing
through a set of different size cylinders.
13
Short Term Analysis Stage
 Uses autocorrelation to calculate a set of eight reflection
coefficients.
 Schur recursion is used to efficiently solve the set of
equations resulting from it.
 The parameters are then converted into log-area ratios
(LARs) -- that allow better quantizing in a smaller
number of bits — the first eight parameters of the
transmission stream.
14
 The coded LARs is then decoded back to coefficients
and used to filter the input samples.
 The reason for decoding the LARs is to ensure that the
encoder uses the same information available at the
decoder to perform the filtering.
 An array of weights lpc[P] is computed such that
s[n] ~ lpc[0]*s[n--1]+lpc[1]*s[n--2]+_+lpc[P--1]*s[n--P]
(P is usually between 8 and 14, GSM uses 8.)
Short Term Analysis Stage (…contd)
15
Long Term Prediction Stage
 The 160 samples are split into 4 sub-windows of 40
samples each.
16
 The long-term predictor produces two parameters for
each sub window: the lag and the gain.
 The LTP lag describes the source of the copy in time.

The LTP gain describes the scaling factor.
Long Term Prediction Stage (…contd)
17
Calculating Lag and Gain
 LAG:
Compute resemblance by correlation.
correlation of x[n] and y[n] =
Sum of products x[n]*y[n-lag]
 GAIN:
Maximum correlation divided by the energy of the
reconstructed short-term residual signal.
18
Residual Pulse Encoding
 To remove the long-term predictable signal from
its input, the algorithm then subtracts the scaled
40 samples.
 The residual signal is either weak or random and
consequently cheaper to encode and transmit.
19
Residual Signal(…contd)
 The algorithm down-samples by a factor of three,
discarding two out of three sample values.
 Results in four evenly spaced 13-value subsequences to
choose from, starting with samples 1, 2, 3, and 4.
 The algorithm picks the sequence with the most energy.
 That leaves us with 13 3-bit sample values and a 6-bit
scaling factor that turns the PCM encoding into an
APCM
20
Speech Decoder
 Decoder consists of three parts
 RPE Decoding
 LTP synthesis filter
 LPC short term synthesis filter
21
Speech Decoder(…contd)
22
Speech Decoder (…contd)
 Algorithm multiplies the 13 3-bit samples by the scaling factor and
expands them back into 40 samples, zero-padding the gaps
 Resulting residual pulse is fed to the long-term synthesis filter
 40-sample segment is cut from the old estimated short-term residual
signal, scaled by the LTP gain and added to the incoming pulse
 Estimated short-term residual signal passes through the short-term
synthesis filter whose reflection coefficients are calculated by the
LPC module
 Noise from the excited long-term synthesis filter passes through the
tubes of the simulated vocal tract--and emerges as speech
23
QUESTIONS ???QUESTIONS ???

Speech compression-using-gsm

  • 1.
    Speech Compression using GSM RPE-LTP FaizaNawaz Bisma Hashmi Mehrin Kiani
  • 2.
    2 Introduction to GSM The Global System for Mobile Communications is the most popular standard for mobile phones in the world.  GSM service is used by over 2 billion people across more than 212 countries and territories.  The ubiquity of the GSM standard makes international roaming very common between mobile phone operators.  GSM differs significantly from its predecessors in that both signaling and speech channels are Digital call quality. (so it is considered a second generation (2G) mobile phone system.)
  • 3.
  • 4.
    4 What is Speech? Speech Generation:
  • 5.
    5 GSM 6.10 Vocoder Key principle: mathematical modeling of the human vocal tract, leading to an efficient compression method for transmitting speech.  A vocoder (combination of voice and coder) is used to describe GSM systems tailored for the compression of speech.  The sampling rate is 8000 sample/s leading to an average bit rate for the encoded bit stream of 13 K bit/s
  • 6.
    6 GSM 6.10 Vocoder Coding scheme used by GSM 6.10 Vocoder is the Regular Pulse Excitation - Long Term prediction - Linear Predictive Coder (RPE-LTP)  Vocoder sends three kinds of information to the receiver:  Voiced or unvoiced signal  (If it is voiced) The period of the excitation signal  The parameters of the prediction filter.
  • 7.
    7 Linear Predictive Coder(LPC)  LPC algorithm assumes that each speech sample is a linear combination of previous samples.  Speech is sampled, stored and analyzed.  Coefficients calculated from the sample are transmitted and processed in the receiver.  Receiver accurately processes and categorizes voiced and unvoiced sounds.
  • 8.
    8 Residual Pulse Excited(RPE) Coder  Determines if the signal is voiced or unvoiced  Determines the period for voiced sounds, encodes periodicity and transmits the coefficient  When the signal changes from voiced to unvoiced, RPE transmits a code that stops the receiver from generating periodic pulses  Starts generating random pulses to correspond to the noise like nature of unvoiced
  • 9.
    9 GSM Compression Technologies Four compression technologies are:  Full Rate  Enhanced Full Rate (EFR)  Adaptive Multi-Rate (AMR)  Half Rate
  • 10.
    10 GSM Full RateVocoder Using RPE-LTP  Described as an RPE-LTP linear predictive coder.  Models the human vocal tract as a series of cylinders of different widths.  By forcing air through these cylinders, speech sounds can be generated— the LPC coder models this with a set of simultaneous equations.
  • 11.
    11 GSM Full RateVocoder Using RPE-LTP (…contd)  The input data to the RPE-LTP coder is 20ms of speech composed of 160 samples, each with 13bit resolution.  The data is first passed through a pre-emphasis filter:  Enhances high-frequency components of the signal. (better transmission efficiency.)  Also removes any offset on the signal. (Simplifies computation.)
  • 12.
    12 LPC Speech Generation The model of speech generation can be thought of as air passing through a set of different size cylinders.
  • 13.
    13 Short Term AnalysisStage  Uses autocorrelation to calculate a set of eight reflection coefficients.  Schur recursion is used to efficiently solve the set of equations resulting from it.  The parameters are then converted into log-area ratios (LARs) -- that allow better quantizing in a smaller number of bits — the first eight parameters of the transmission stream.
  • 14.
    14  The codedLARs is then decoded back to coefficients and used to filter the input samples.  The reason for decoding the LARs is to ensure that the encoder uses the same information available at the decoder to perform the filtering.  An array of weights lpc[P] is computed such that s[n] ~ lpc[0]*s[n--1]+lpc[1]*s[n--2]+_+lpc[P--1]*s[n--P] (P is usually between 8 and 14, GSM uses 8.) Short Term Analysis Stage (…contd)
  • 15.
    15 Long Term PredictionStage  The 160 samples are split into 4 sub-windows of 40 samples each.
  • 16.
    16  The long-termpredictor produces two parameters for each sub window: the lag and the gain.  The LTP lag describes the source of the copy in time.  The LTP gain describes the scaling factor. Long Term Prediction Stage (…contd)
  • 17.
    17 Calculating Lag andGain  LAG: Compute resemblance by correlation. correlation of x[n] and y[n] = Sum of products x[n]*y[n-lag]  GAIN: Maximum correlation divided by the energy of the reconstructed short-term residual signal.
  • 18.
    18 Residual Pulse Encoding To remove the long-term predictable signal from its input, the algorithm then subtracts the scaled 40 samples.  The residual signal is either weak or random and consequently cheaper to encode and transmit.
  • 19.
    19 Residual Signal(…contd)  Thealgorithm down-samples by a factor of three, discarding two out of three sample values.  Results in four evenly spaced 13-value subsequences to choose from, starting with samples 1, 2, 3, and 4.  The algorithm picks the sequence with the most energy.  That leaves us with 13 3-bit sample values and a 6-bit scaling factor that turns the PCM encoding into an APCM
  • 20.
    20 Speech Decoder  Decoderconsists of three parts  RPE Decoding  LTP synthesis filter  LPC short term synthesis filter
  • 21.
  • 22.
    22 Speech Decoder (…contd) Algorithm multiplies the 13 3-bit samples by the scaling factor and expands them back into 40 samples, zero-padding the gaps  Resulting residual pulse is fed to the long-term synthesis filter  40-sample segment is cut from the old estimated short-term residual signal, scaled by the LTP gain and added to the incoming pulse  Estimated short-term residual signal passes through the short-term synthesis filter whose reflection coefficients are calculated by the LPC module  Noise from the excited long-term synthesis filter passes through the tubes of the simulated vocal tract--and emerges as speech
  • 23.