SlideShare a Scribd company logo
Adaptive Signal Processing Term Paper 2015 DISHA MODI (Roll No:15MECC12) 1
Abstract—The past decade has observed progress towards the
submission of low-rate speech coders to public and military
communications. It is essential to this progress that has been the
new speech coders accomplished high quality speech at low data
rates. These coders include mechanisms to show the spectral
properties of speech like speech waveform matching, and
improve the code performance for the human ear. Several of
these have been adopted in cellular telephony standards.
Service providers are unceasingly met with the challenge of
accommodating more users within a limited allocated bandwidth
in mobile communication services. For this object, service
providers are constantly in search of low bit-rate speech coders
that deliver high-quality speech.
In this paper the simulated low bit rate speech signal using
Linear Predictive Coding (LPC) in MATLAB was implemented.
Index Terms—Auto Correlation, Formants, LPC, Levinson
Durbin recursion.
I. INTRODUCTION
―LPC was first introduced as a method for encoding human
speech by the United States Department of Defense in federal
standard 1015, published in 1984‖[1]. Vocal tract can be
approximated as a variable diameter tube. Human speech is
produced in the vocal tract. The linear predictive coding
(LPC) model is based on the vocal tract characterized by this
tube of a varying diameter and it represented in mathematical
approximation. At a particular time, the speech sample is
equals to linear sum of the p previous samples. The important
facet of LPC is the linear predictive filter which determines
the value of the next sample by a linear combination of
previous samples. ―In normal scenario, speech is sampled at
8000 samples/second with 8 bits quantization. This delivers
data rate of 64000 bits/second. Linear predictive coding drops
this to 2400 bits/second.‖[1]. At this rate the speech has a
distinct synthetic sound and there is an obvious loss of quality.
However, the speech can still be easily understandable and
audible to human kind. Hence, it is a lossy form of
compression.
Sometimes, lossy algorithms are thought-out acceptable
because the loss of quality is often undetectable to the human
ear. Fact is that in conversations silence take up greater than
50% of time. It is an easy way to save bandwidth that not to
transmit the silence. One important thing about speech
production is that mechanically there is a high correlation
between adjacent samples of speech.
II. LPC SYSTEM IMPLEMENTATION
The filter model used in LPC is known as the linear predictive
filter. It has two key components: analysis / encoding and
synthesis / decoding.
III. LPC Analyzing/encoding
The encoding part of LPC includes observing the speech
signal and break down it into segments.
Fig. 1 LPC encoder block-diagram
LP methods have been used in control and information
theory—called methods of system estimation and system
identification used extensively in speech under group of
names mentioned below referred from [7].
1. covariance method
2. autocorrelation method
3. lattice method
4. inverse filter formulation
5. spectral estimation formulation
6. maximum likelihood method
7. inner product method
A. Input speech
Under the normal situation, the input signal is sampled at a
rate of 8000 samples per second. This input signal is then
break down into segments and it is transmitted to the receiver.
The 8000 samples in each second of speech signal are broken
into approx. 180 sample segments. This means that each
segment represents 22.5 milliseconds of the input speech
signal.
B. Voice/Unvoiced Determination
As per LPC algorithm, before a speech segment is determined
as being voiced or unvoiced it is first passed through a low-
pass filter with a band of 1 kHz. It is important to determine if
a segment is voiced or unvoiced because voiced sounds have a
distinct waveform then unvoiced sounds. The LPC encoder
informs the decoder if a signal segment is voiced or unvoiced
by sending a single bit. Remember that voiced sounds are
generally vowels and can be considered as a pulse that is
similar to periodic waveforms. These sounds have very large
amplitudes and high energy levels. Voiced sounds also have
distinct formant or resonant frequencies. Unvoiced sounds are
usually non-vowel or consonants sounds and often have
random waveforms and are chaotic. It has smaller amplitudes
then voiced sounds and therefore less energy.
Hence, the decision of voiced and unvoiced speech signals is
confirmed by counting the number of times a waveform
crosses the x-axis and then comparing that value to the
normally range of values (threshold Values) for most unvoiced
and voiced sounds.
Speech Compression using LPC
Disha Modi, M.Tech (Communication),
Electronics and Communication Department
Institute of Technology - Nirma University
Adaptive Signal Processing Term Paper 2015 DISHA MODI (Roll No:15MECC12) 2
C. Pitch Period Estimation
The pitch period can be thought of as the period of the vocal
cord vibration that happens during the construction of voiced
speech. Therefore, the pitch period is only required for the
decoding of voiced segments and is not needed for unvoiced
segments since they are produced by turbulent air flow not
vocal cord vibrations. One type of algorithm takes advantage
of the fact that the autocorrelation of a period function,
Rxx(k), will have a maximum when k is equivalent to the
pitch period. These algorithms usually detect a maximum
value by checking the autocorrelation value against a
threshold value. One problem with algorithms that use
autocorrelation is that the validity of their results is susceptible
to interference as a result of other resonances in the vocal
tract. When interference occurs the algorithm can’t guarantee
accurate results. Another problem with autocorrelation
algorithms occurs because voiced speech is not entirely
periodic. This means that the maximum will be lower than it
should be for a true periodic signal.
D. Vocal Tract Filter
The filter that is used by the decoder to re-form the original
input signal is formed based on a set of coefficients. In order
to find the filter coefficients that best match the current
segment being examined the encoder tries to minimize the
mean squared error.
= ∑
E[ ∑ ]=0
-2E[ ∑ ]=0
∑ [ ] [ ]
(Use fact that [ ]
Taking the derivative yields a set of M equations. To solve for
the filter coefficients E[ ] has to be estimate.
Autocorrelation is the approach that will be explained here for
linear predictive coding. Autocorrelation needs several initial
assumptions be made about the set or sequence of speech
samples, [ ], in the current segment. First, it needs [ ] be
stationary and second, it needs the [ ] sequence is zero
outside of the current segment. In autocorrelation, each
E[ ] is converted into an autocorrelation function of
the form Ryy(|i-j|). The estimation of an autocorrelation
function Ryy(k) can be expressed as follows.
Using Ryy(k), the M equations that were acquired from taking
the derivative of the mean squared error can be written in
matrix form RA = P where A contains the filter coefficients.
In order to determine the filter coefficients, the equation A =
P must be solved. This equation cannot be solved without
first computing . This is an easy computation if one
observes that R is symmetric and all diagonals consist of the
same element. This type of matrix is called a Toeplitz matrix
and can be easily inverted [1].
The Levinson-Durbin (L-D) Algorithm is a recursive
algorithm that is considered very computationally efficient
since it takes advantage of the properties of R when
determining the filter coefficients.
L-D Algorithm [2]
The basic simple ideas behind the recursion are first that it is
easy to solve the system for k =1, and second that it is also
very simple to solve for a k +1 coefficients sized problem
when we have solved a for a k coefficients sized problem. In
general none of the coefficients of the different sized problem
match, so it is not a way to calculate but a way to
calculate the whole vector as a function of ,
and . Thinking about it Levinson-Durbin induction would
be a better name.
We are looking for =[ ] so that =[ ] with
=[ ] and is not necessary at this stage. The dot
product of the second line of gives
+ = 0
Therefore,
and +
Solving the size K+1 Problem
Suppose that we have solved the size k problem and have
found , and .
Then we have
has one more row and column than so we cannot
apply it directly to , however if we expend with a zero
and call this vector we can apply to it and we get
the following interesting result
Adaptive Signal Processing Term Paper 2015 DISHA MODI (Roll No:15MECC12) 3
Since the matrix is symmetric, we also have something
remarkable when reversing the order of coefficients of
and calling this vector .
We can notice that a linear combination is of
the form wanted for since the first element is a 1 for all
values of . Now if there was a value of for
Calculating ) gives
IV. TRANSMITTING THE PARAMETERS[1]
In an original form, speech is usually transmitted at 64,000
bits/second using 8 bits/sample and a rate of 8000 Hz for
sampling. LPC drops this rate to 2,400 bits/second by breaking
the speech into segments and then directing the
voiced/unvoiced information, the pitch period, and the
coefficients for the filter that signifies the vocal tract for each
segment. The compressed signal used by the filter on the
receiver end is determined by the classification of the speech
segment as voiced or unvoiced and by the pitch period of the
segment. The encoder transmits a single bit to tell if the
current segment is voiced or unvoiced. The pitch period is
quantized using quantizer. 6 bits are required to represent the
pitch period.
If the segment contains voiced speech than a 10th order filter
is used. This means that 11 values are needed: 10 reflection
coefficients and the gain. If the segment contains unvoiced
speech than a 4th order filter is used. This means that 5 values
are needed: 4 reflection coefficients and the gain.
Quantization done as follows:
1 bit voiced/unvoiced
6 bits pitch period (60 values)
10 bits k1 and k2 (5 each)
10 bits k3 and k4 (5 each)
16 bits k5, k6, k7, k8 (4 each)
3 bits k9
2 bits k10
5 bits gain G
1 bit synchronization
54 bits TOTAL BITS PER FRAME
Verification for Bit Rate of LPC Speech Segments
Sample rate = 8000 samples/second
Samples per segment = 180 samples/segment
Segment rate = Sample Rate/ Samples per Segment
= (8000 samples/second)/ (180 samples/second)
= 44.444444.... Segments/second
Segment size = 54 bits/segment
Bit rate = Segment size * Segment rate
= (54 bits/segment) * (44.44 segments/second)
= 2400 bits/second
V. LPC synthesis/decoding
Fig. 2 LPC synthesizer/decoder block-diagram [4]
The process of decoding a sequence of speech segments is the
reverse of the encoding process. Each segment is decoded
individually and the sequence of reproduced sound segments
is joined together to represent the entire input speech signal.
The decoding or synthesis of a speech segment is based on the
54 bits of information that are transmitted from the encoder.
Each segment of speech has a different LPC filter that is
eventually produced using the reflection coefficients and the
gain that are received from the encoder. 10 reflection
coefficients are used for voiced segment filters and 4
reflection coefficients are used for unvoiced segments. These
reflection coefficients are used to generate the vocal tract
coefficients or parameters which are used to create the filter.
The final step of decoding a segment of speech is to pass the
excitement signal through the filter to produce the synthesized
speech signal.
VI. APPLICATION
In general, the most common usage for speech compression is
in standard telephone systems. In fact, a lot of the technology
Adaptive Signal Processing Term Paper 2015 DISHA MODI (Roll No:15MECC12) 4
used in speech compression was developed by the phone
companies. Further applications of LPC and other speech
compression schemes are voice mail systems, telephone
answering machines, and multimedia applications. Most
multimedia applications, unlike telephone applications,
involve one-way communication and involve storing the data.
SIMULATION RESULTS
Simulated low bit rate different speech signals using Linear
Predictive Coding (LPC) in MATLAB was implemented.
Fig. 3 Female Original Voice
Fig. 4 Female LPC coded Voice
Fig. 5 Male Original Voice
Fig. 6 Male LPC coded Voice
Performance measurements of LPC compressed signals (both
male and female) are shown in Table I. Looking at the SNR
computed in Table I, it is obvious that both male and female
sounds are noisy as they have a low SNR value. It observed
that for all levels of compression the quality is better with
male signal than female signal; On the other hand the
compression factor with female signal has larger values
comparable with these of male signal. This result is expected
because the female voice has more high frequencies than male
voice. It has observed that no further enhancements can be
achieved beyond certain level of decomposition for both
signals.
PARAMETER MALE FEMALE
Sampling Rate 8000 8000
File length
(in seconds)
2.07 2.77
Length of Original
Signal
99328 133120
Length of
Constructed Signal
97920 132480
SNR(in dB) 17.077 14.77
Compression Ratio 0.9858 0.9952
Table 1 Comparison of male and female LPC synthesized voice
CONCLUSION
Linear Predictive Coding is an analysis/synthesis technique to
lossy speech compression that attempts to model the human
production of sound instead of transmitting an estimate of the
sound wave. Linear predictive coding achieves a bit rate of
2400 bits/second which makes it ideal for use in secure
telephone systems. Secure telephone systems are more
concerned that the content and meaning of speech, rather than
the quality of speech, be preserved. The tradeoff for LPC’s
low bit rate is that it does have some difficulty with certain
sounds and it produces speech that sound synthetic. Linear
predictive coding encoders break up a sound signal into
different segments and then send information on each segment
to the decoder. The encoder send information on whether the
segment is voiced or unvoiced and the pitch period for voiced
segment which is used to create an excitement signal in the
decoder. The encoder also sends information about the vocal
tract which is used to build a filter on the decoder side which
when given the excitement signal as input can reproduce the
original speech.
REFERENCES
[1] J. Bradbury, ―Linear Predictive Coding,‖ 2000.
[2] C. Collomb, ―1 . Description of Linear Prediction 2 . Minimizing the
error,‖ pp. 1–7, 2009.
[3] D. R. Sandeep, ―Compression and Enhancement of Speech Signals,‖ no.
Seiscon, pp. 774–779, 2011.
[4] M. A. Osman, N. Al, H. M. Magboub, and S. A. Alfandi, ―Speech
compression uses LPC and wavelet,‖ pp. 92–99, 2010.
[5] V. Hardman and O. Hodson. Internet/Mbone Audio (2000) 5-7.
[6] Scott C. Douglas. Introduction to Adaptive Filters, Digital Signal
Processing Handbook (1999) 7-12.
[7] D. S. Processing, ―Digital Speech Processing — Lecture 13 Linear
Predictive Coding ( LPC ) - Introduction LPC Methods.‖
Poor, H. V., Looney, C. G., Marks II, R. J., Verdú, S., Thomas, J. A.,
Cover, T. M. Information Theory. The Electrical Engineering Handbook
(2000) 56-57.

More Related Content

What's hot

Linear predictive coding documentation
Linear predictive coding  documentationLinear predictive coding  documentation
Linear predictive coding documentation
chakravarthy Gopi
 
lpc and horn noise detection
lpc and horn noise detectionlpc and horn noise detection
lpc and horn noise detection
Pranathi V.N Vemuri
 
multirate signal processing for speech
multirate signal processing for speechmultirate signal processing for speech
multirate signal processing for speech
Rudra Prasad Maiti
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive Coding
Srishti Kakade
 
Multimedia seminar ppt
Multimedia seminar pptMultimedia seminar ppt
Multimedia seminar ppt
Anandi Kumari
 
Subband Coding
Subband CodingSubband Coding
Subband Coding
Mihika Shah
 
Audio and video compression
Audio and video compressionAudio and video compression
Audio and video compression
neeraj9217
 
Speech coding standards2
Speech coding standards2Speech coding standards2
Speech coding standards2
elroy25
 
Lecture 18 (5)
Lecture 18 (5)Lecture 18 (5)
Lecture 18 (5)
Deepakkumar5880
 
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications IDSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
Amr E. Mohamed
 
Speech coding std
Speech coding stdSpeech coding std
Speech coding std
Swapnil Sonawane
 
Digital modeling of speech signal
Digital modeling of speech signalDigital modeling of speech signal
Digital modeling of speech signal
Vinodhini
 
Digital Signal Processing-Digital Filters
Digital Signal Processing-Digital FiltersDigital Signal Processing-Digital Filters
Digital Signal Processing-Digital Filters
Nelson Anand
 
Signals&Systems: Quick pointers to Fundamentals
Signals&Systems: Quick pointers to FundamentalsSignals&Systems: Quick pointers to Fundamentals
Signals&Systems: Quick pointers to Fundamentals
Minakshi Atre
 
Introduction to dsp by bibhu prasad ganthia
Introduction to dsp by bibhu prasad ganthiaIntroduction to dsp by bibhu prasad ganthia
Introduction to dsp by bibhu prasad ganthia
Dr. Bibhu Prasad Ganthia
 
Audio compression 1
Audio compression 1Audio compression 1
Audio compression 1
Rajat Kumar
 
Wireless digital communication and coding techniques new
Wireless digital communication and coding techniques newWireless digital communication and coding techniques new
Wireless digital communication and coding techniques new
Clyde Lettsome
 
Lecture 8 audio compression
Lecture 8 audio compressionLecture 8 audio compression
Lecture 8 audio compression
Mr SMAK
 
Environmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniqueEnvironmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC technique
Pankaj Kumar
 
Text independent speaker recognition system
Text independent speaker recognition systemText independent speaker recognition system
Text independent speaker recognition system
Deepesh Lekhak
 

What's hot (20)

Linear predictive coding documentation
Linear predictive coding  documentationLinear predictive coding  documentation
Linear predictive coding documentation
 
lpc and horn noise detection
lpc and horn noise detectionlpc and horn noise detection
lpc and horn noise detection
 
multirate signal processing for speech
multirate signal processing for speechmultirate signal processing for speech
multirate signal processing for speech
 
Linear Predictive Coding
Linear Predictive CodingLinear Predictive Coding
Linear Predictive Coding
 
Multimedia seminar ppt
Multimedia seminar pptMultimedia seminar ppt
Multimedia seminar ppt
 
Subband Coding
Subband CodingSubband Coding
Subband Coding
 
Audio and video compression
Audio and video compressionAudio and video compression
Audio and video compression
 
Speech coding standards2
Speech coding standards2Speech coding standards2
Speech coding standards2
 
Lecture 18 (5)
Lecture 18 (5)Lecture 18 (5)
Lecture 18 (5)
 
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications IDSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
DSP_FOEHU - Lec 13 - Digital Signal Processing Applications I
 
Speech coding std
Speech coding stdSpeech coding std
Speech coding std
 
Digital modeling of speech signal
Digital modeling of speech signalDigital modeling of speech signal
Digital modeling of speech signal
 
Digital Signal Processing-Digital Filters
Digital Signal Processing-Digital FiltersDigital Signal Processing-Digital Filters
Digital Signal Processing-Digital Filters
 
Signals&Systems: Quick pointers to Fundamentals
Signals&Systems: Quick pointers to FundamentalsSignals&Systems: Quick pointers to Fundamentals
Signals&Systems: Quick pointers to Fundamentals
 
Introduction to dsp by bibhu prasad ganthia
Introduction to dsp by bibhu prasad ganthiaIntroduction to dsp by bibhu prasad ganthia
Introduction to dsp by bibhu prasad ganthia
 
Audio compression 1
Audio compression 1Audio compression 1
Audio compression 1
 
Wireless digital communication and coding techniques new
Wireless digital communication and coding techniques newWireless digital communication and coding techniques new
Wireless digital communication and coding techniques new
 
Lecture 8 audio compression
Lecture 8 audio compressionLecture 8 audio compression
Lecture 8 audio compression
 
Environmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC techniqueEnvironmental Sound detection Using MFCC technique
Environmental Sound detection Using MFCC technique
 
Text independent speaker recognition system
Text independent speaker recognition systemText independent speaker recognition system
Text independent speaker recognition system
 

Similar to Speech Compression using LPC

G010424248
G010424248G010424248
G010424248
IOSR Journals
 
H0814247
H0814247H0814247
H0814247
IOSR Journals
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using Vocoder
IJTET Journal
 
Echo Cancellation Algorithms using Adaptive Filters: A Comparative Study
Echo Cancellation Algorithms using Adaptive Filters: A Comparative StudyEcho Cancellation Algorithms using Adaptive Filters: A Comparative Study
Echo Cancellation Algorithms using Adaptive Filters: A Comparative Study
idescitation
 
Lpc vocoder implemented by using matlab
Lpc vocoder implemented by using matlabLpc vocoder implemented by using matlab
Lpc vocoder implemented by using matlab
chakravarthy Gopi
 
Audio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for DepressionAudio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for Depression
ijsrd.com
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognition
phyuhsan
 
Analysis of PEAQ Model using Wavelet Decomposition Techniques
Analysis of PEAQ Model using Wavelet Decomposition TechniquesAnalysis of PEAQ Model using Wavelet Decomposition Techniques
Analysis of PEAQ Model using Wavelet Decomposition Techniques
idescitation
 
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
IRJET Journal
 
IRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time DomainIRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time Domain
IRJET Journal
 
Speech Compression Using Wavelets
Speech Compression Using Wavelets Speech Compression Using Wavelets
Speech Compression Using Wavelets
IJMER
 
Coding
CodingCoding
SignalDecompositionTheory.pptx
SignalDecompositionTheory.pptxSignalDecompositionTheory.pptx
SignalDecompositionTheory.pptx
PriyankaDarshana
 
Finite Wordlength Linear-Phase FIR Filter Design Using Babai's Algorithm
Finite Wordlength Linear-Phase FIR Filter Design Using Babai's AlgorithmFinite Wordlength Linear-Phase FIR Filter Design Using Babai's Algorithm
Finite Wordlength Linear-Phase FIR Filter Design Using Babai's Algorithm
CSCJournals
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantization
eSAT Publishing House
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantization
eSAT Journals
 
Unit iv wcn main
Unit iv wcn mainUnit iv wcn main
Unit iv wcn main
vilasini rvr
 
Speech encoding techniques
Speech encoding techniquesSpeech encoding techniques
Speech encoding techniques
Hemaraja Nayaka S
 
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
IDES Editor
 

Similar to Speech Compression using LPC (20)

G010424248
G010424248G010424248
G010424248
 
H0814247
H0814247H0814247
H0814247
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Speech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using VocoderSpeech Analysis and synthesis using Vocoder
Speech Analysis and synthesis using Vocoder
 
Echo Cancellation Algorithms using Adaptive Filters: A Comparative Study
Echo Cancellation Algorithms using Adaptive Filters: A Comparative StudyEcho Cancellation Algorithms using Adaptive Filters: A Comparative Study
Echo Cancellation Algorithms using Adaptive Filters: A Comparative Study
 
Lpc vocoder implemented by using matlab
Lpc vocoder implemented by using matlabLpc vocoder implemented by using matlab
Lpc vocoder implemented by using matlab
 
Audio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for DepressionAudio/Speech Signal Analysis for Depression
Audio/Speech Signal Analysis for Depression
 
Voice biometric recognition
Voice biometric recognitionVoice biometric recognition
Voice biometric recognition
 
Analysis of PEAQ Model using Wavelet Decomposition Techniques
Analysis of PEAQ Model using Wavelet Decomposition TechniquesAnalysis of PEAQ Model using Wavelet Decomposition Techniques
Analysis of PEAQ Model using Wavelet Decomposition Techniques
 
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
A Noise Reduction Method Based on Modified Least Mean Square Algorithm of Rea...
 
IRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time DomainIRJET- Pitch Detection Algorithms in Time Domain
IRJET- Pitch Detection Algorithms in Time Domain
 
Speech Compression Using Wavelets
Speech Compression Using Wavelets Speech Compression Using Wavelets
Speech Compression Using Wavelets
 
Coding
CodingCoding
Coding
 
SignalDecompositionTheory.pptx
SignalDecompositionTheory.pptxSignalDecompositionTheory.pptx
SignalDecompositionTheory.pptx
 
Finite Wordlength Linear-Phase FIR Filter Design Using Babai's Algorithm
Finite Wordlength Linear-Phase FIR Filter Design Using Babai's AlgorithmFinite Wordlength Linear-Phase FIR Filter Design Using Babai's Algorithm
Finite Wordlength Linear-Phase FIR Filter Design Using Babai's Algorithm
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantization
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantization
 
Unit iv wcn main
Unit iv wcn mainUnit iv wcn main
Unit iv wcn main
 
Speech encoding techniques
Speech encoding techniquesSpeech encoding techniques
Speech encoding techniques
 
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition S...
 

Speech Compression using LPC

  • 1. Adaptive Signal Processing Term Paper 2015 DISHA MODI (Roll No:15MECC12) 1 Abstract—The past decade has observed progress towards the submission of low-rate speech coders to public and military communications. It is essential to this progress that has been the new speech coders accomplished high quality speech at low data rates. These coders include mechanisms to show the spectral properties of speech like speech waveform matching, and improve the code performance for the human ear. Several of these have been adopted in cellular telephony standards. Service providers are unceasingly met with the challenge of accommodating more users within a limited allocated bandwidth in mobile communication services. For this object, service providers are constantly in search of low bit-rate speech coders that deliver high-quality speech. In this paper the simulated low bit rate speech signal using Linear Predictive Coding (LPC) in MATLAB was implemented. Index Terms—Auto Correlation, Formants, LPC, Levinson Durbin recursion. I. INTRODUCTION ―LPC was first introduced as a method for encoding human speech by the United States Department of Defense in federal standard 1015, published in 1984‖[1]. Vocal tract can be approximated as a variable diameter tube. Human speech is produced in the vocal tract. The linear predictive coding (LPC) model is based on the vocal tract characterized by this tube of a varying diameter and it represented in mathematical approximation. At a particular time, the speech sample is equals to linear sum of the p previous samples. The important facet of LPC is the linear predictive filter which determines the value of the next sample by a linear combination of previous samples. ―In normal scenario, speech is sampled at 8000 samples/second with 8 bits quantization. This delivers data rate of 64000 bits/second. Linear predictive coding drops this to 2400 bits/second.‖[1]. At this rate the speech has a distinct synthetic sound and there is an obvious loss of quality. However, the speech can still be easily understandable and audible to human kind. Hence, it is a lossy form of compression. Sometimes, lossy algorithms are thought-out acceptable because the loss of quality is often undetectable to the human ear. Fact is that in conversations silence take up greater than 50% of time. It is an easy way to save bandwidth that not to transmit the silence. One important thing about speech production is that mechanically there is a high correlation between adjacent samples of speech. II. LPC SYSTEM IMPLEMENTATION The filter model used in LPC is known as the linear predictive filter. It has two key components: analysis / encoding and synthesis / decoding. III. LPC Analyzing/encoding The encoding part of LPC includes observing the speech signal and break down it into segments. Fig. 1 LPC encoder block-diagram LP methods have been used in control and information theory—called methods of system estimation and system identification used extensively in speech under group of names mentioned below referred from [7]. 1. covariance method 2. autocorrelation method 3. lattice method 4. inverse filter formulation 5. spectral estimation formulation 6. maximum likelihood method 7. inner product method A. Input speech Under the normal situation, the input signal is sampled at a rate of 8000 samples per second. This input signal is then break down into segments and it is transmitted to the receiver. The 8000 samples in each second of speech signal are broken into approx. 180 sample segments. This means that each segment represents 22.5 milliseconds of the input speech signal. B. Voice/Unvoiced Determination As per LPC algorithm, before a speech segment is determined as being voiced or unvoiced it is first passed through a low- pass filter with a band of 1 kHz. It is important to determine if a segment is voiced or unvoiced because voiced sounds have a distinct waveform then unvoiced sounds. The LPC encoder informs the decoder if a signal segment is voiced or unvoiced by sending a single bit. Remember that voiced sounds are generally vowels and can be considered as a pulse that is similar to periodic waveforms. These sounds have very large amplitudes and high energy levels. Voiced sounds also have distinct formant or resonant frequencies. Unvoiced sounds are usually non-vowel or consonants sounds and often have random waveforms and are chaotic. It has smaller amplitudes then voiced sounds and therefore less energy. Hence, the decision of voiced and unvoiced speech signals is confirmed by counting the number of times a waveform crosses the x-axis and then comparing that value to the normally range of values (threshold Values) for most unvoiced and voiced sounds. Speech Compression using LPC Disha Modi, M.Tech (Communication), Electronics and Communication Department Institute of Technology - Nirma University
  • 2. Adaptive Signal Processing Term Paper 2015 DISHA MODI (Roll No:15MECC12) 2 C. Pitch Period Estimation The pitch period can be thought of as the period of the vocal cord vibration that happens during the construction of voiced speech. Therefore, the pitch period is only required for the decoding of voiced segments and is not needed for unvoiced segments since they are produced by turbulent air flow not vocal cord vibrations. One type of algorithm takes advantage of the fact that the autocorrelation of a period function, Rxx(k), will have a maximum when k is equivalent to the pitch period. These algorithms usually detect a maximum value by checking the autocorrelation value against a threshold value. One problem with algorithms that use autocorrelation is that the validity of their results is susceptible to interference as a result of other resonances in the vocal tract. When interference occurs the algorithm can’t guarantee accurate results. Another problem with autocorrelation algorithms occurs because voiced speech is not entirely periodic. This means that the maximum will be lower than it should be for a true periodic signal. D. Vocal Tract Filter The filter that is used by the decoder to re-form the original input signal is formed based on a set of coefficients. In order to find the filter coefficients that best match the current segment being examined the encoder tries to minimize the mean squared error. = ∑ E[ ∑ ]=0 -2E[ ∑ ]=0 ∑ [ ] [ ] (Use fact that [ ] Taking the derivative yields a set of M equations. To solve for the filter coefficients E[ ] has to be estimate. Autocorrelation is the approach that will be explained here for linear predictive coding. Autocorrelation needs several initial assumptions be made about the set or sequence of speech samples, [ ], in the current segment. First, it needs [ ] be stationary and second, it needs the [ ] sequence is zero outside of the current segment. In autocorrelation, each E[ ] is converted into an autocorrelation function of the form Ryy(|i-j|). The estimation of an autocorrelation function Ryy(k) can be expressed as follows. Using Ryy(k), the M equations that were acquired from taking the derivative of the mean squared error can be written in matrix form RA = P where A contains the filter coefficients. In order to determine the filter coefficients, the equation A = P must be solved. This equation cannot be solved without first computing . This is an easy computation if one observes that R is symmetric and all diagonals consist of the same element. This type of matrix is called a Toeplitz matrix and can be easily inverted [1]. The Levinson-Durbin (L-D) Algorithm is a recursive algorithm that is considered very computationally efficient since it takes advantage of the properties of R when determining the filter coefficients. L-D Algorithm [2] The basic simple ideas behind the recursion are first that it is easy to solve the system for k =1, and second that it is also very simple to solve for a k +1 coefficients sized problem when we have solved a for a k coefficients sized problem. In general none of the coefficients of the different sized problem match, so it is not a way to calculate but a way to calculate the whole vector as a function of , and . Thinking about it Levinson-Durbin induction would be a better name. We are looking for =[ ] so that =[ ] with =[ ] and is not necessary at this stage. The dot product of the second line of gives + = 0 Therefore, and + Solving the size K+1 Problem Suppose that we have solved the size k problem and have found , and . Then we have has one more row and column than so we cannot apply it directly to , however if we expend with a zero and call this vector we can apply to it and we get the following interesting result
  • 3. Adaptive Signal Processing Term Paper 2015 DISHA MODI (Roll No:15MECC12) 3 Since the matrix is symmetric, we also have something remarkable when reversing the order of coefficients of and calling this vector . We can notice that a linear combination is of the form wanted for since the first element is a 1 for all values of . Now if there was a value of for Calculating ) gives IV. TRANSMITTING THE PARAMETERS[1] In an original form, speech is usually transmitted at 64,000 bits/second using 8 bits/sample and a rate of 8000 Hz for sampling. LPC drops this rate to 2,400 bits/second by breaking the speech into segments and then directing the voiced/unvoiced information, the pitch period, and the coefficients for the filter that signifies the vocal tract for each segment. The compressed signal used by the filter on the receiver end is determined by the classification of the speech segment as voiced or unvoiced and by the pitch period of the segment. The encoder transmits a single bit to tell if the current segment is voiced or unvoiced. The pitch period is quantized using quantizer. 6 bits are required to represent the pitch period. If the segment contains voiced speech than a 10th order filter is used. This means that 11 values are needed: 10 reflection coefficients and the gain. If the segment contains unvoiced speech than a 4th order filter is used. This means that 5 values are needed: 4 reflection coefficients and the gain. Quantization done as follows: 1 bit voiced/unvoiced 6 bits pitch period (60 values) 10 bits k1 and k2 (5 each) 10 bits k3 and k4 (5 each) 16 bits k5, k6, k7, k8 (4 each) 3 bits k9 2 bits k10 5 bits gain G 1 bit synchronization 54 bits TOTAL BITS PER FRAME Verification for Bit Rate of LPC Speech Segments Sample rate = 8000 samples/second Samples per segment = 180 samples/segment Segment rate = Sample Rate/ Samples per Segment = (8000 samples/second)/ (180 samples/second) = 44.444444.... Segments/second Segment size = 54 bits/segment Bit rate = Segment size * Segment rate = (54 bits/segment) * (44.44 segments/second) = 2400 bits/second V. LPC synthesis/decoding Fig. 2 LPC synthesizer/decoder block-diagram [4] The process of decoding a sequence of speech segments is the reverse of the encoding process. Each segment is decoded individually and the sequence of reproduced sound segments is joined together to represent the entire input speech signal. The decoding or synthesis of a speech segment is based on the 54 bits of information that are transmitted from the encoder. Each segment of speech has a different LPC filter that is eventually produced using the reflection coefficients and the gain that are received from the encoder. 10 reflection coefficients are used for voiced segment filters and 4 reflection coefficients are used for unvoiced segments. These reflection coefficients are used to generate the vocal tract coefficients or parameters which are used to create the filter. The final step of decoding a segment of speech is to pass the excitement signal through the filter to produce the synthesized speech signal. VI. APPLICATION In general, the most common usage for speech compression is in standard telephone systems. In fact, a lot of the technology
  • 4. Adaptive Signal Processing Term Paper 2015 DISHA MODI (Roll No:15MECC12) 4 used in speech compression was developed by the phone companies. Further applications of LPC and other speech compression schemes are voice mail systems, telephone answering machines, and multimedia applications. Most multimedia applications, unlike telephone applications, involve one-way communication and involve storing the data. SIMULATION RESULTS Simulated low bit rate different speech signals using Linear Predictive Coding (LPC) in MATLAB was implemented. Fig. 3 Female Original Voice Fig. 4 Female LPC coded Voice Fig. 5 Male Original Voice Fig. 6 Male LPC coded Voice Performance measurements of LPC compressed signals (both male and female) are shown in Table I. Looking at the SNR computed in Table I, it is obvious that both male and female sounds are noisy as they have a low SNR value. It observed that for all levels of compression the quality is better with male signal than female signal; On the other hand the compression factor with female signal has larger values comparable with these of male signal. This result is expected because the female voice has more high frequencies than male voice. It has observed that no further enhancements can be achieved beyond certain level of decomposition for both signals. PARAMETER MALE FEMALE Sampling Rate 8000 8000 File length (in seconds) 2.07 2.77 Length of Original Signal 99328 133120 Length of Constructed Signal 97920 132480 SNR(in dB) 17.077 14.77 Compression Ratio 0.9858 0.9952 Table 1 Comparison of male and female LPC synthesized voice CONCLUSION Linear Predictive Coding is an analysis/synthesis technique to lossy speech compression that attempts to model the human production of sound instead of transmitting an estimate of the sound wave. Linear predictive coding achieves a bit rate of 2400 bits/second which makes it ideal for use in secure telephone systems. Secure telephone systems are more concerned that the content and meaning of speech, rather than the quality of speech, be preserved. The tradeoff for LPC’s low bit rate is that it does have some difficulty with certain sounds and it produces speech that sound synthetic. Linear predictive coding encoders break up a sound signal into different segments and then send information on each segment to the decoder. The encoder send information on whether the segment is voiced or unvoiced and the pitch period for voiced segment which is used to create an excitement signal in the decoder. The encoder also sends information about the vocal tract which is used to build a filter on the decoder side which when given the excitement signal as input can reproduce the original speech. REFERENCES [1] J. Bradbury, ―Linear Predictive Coding,‖ 2000. [2] C. Collomb, ―1 . Description of Linear Prediction 2 . Minimizing the error,‖ pp. 1–7, 2009. [3] D. R. Sandeep, ―Compression and Enhancement of Speech Signals,‖ no. Seiscon, pp. 774–779, 2011. [4] M. A. Osman, N. Al, H. M. Magboub, and S. A. Alfandi, ―Speech compression uses LPC and wavelet,‖ pp. 92–99, 2010. [5] V. Hardman and O. Hodson. Internet/Mbone Audio (2000) 5-7. [6] Scott C. Douglas. Introduction to Adaptive Filters, Digital Signal Processing Handbook (1999) 7-12. [7] D. S. Processing, ―Digital Speech Processing — Lecture 13 Linear Predictive Coding ( LPC ) - Introduction LPC Methods.‖ Poor, H. V., Looney, C. G., Marks II, R. J., Verdú, S., Thomas, J. A., Cover, T. M. Information Theory. The Electrical Engineering Handbook (2000) 56-57.