Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this document? Why not share!

- LPC for Speech Recognition by Dr. Uday Saikia 4995 views
- Lpc by Shruti Bhatnagar ... 3888 views
- Linear prediction by Uma Rajaram 6438 views
- Loadแนวข้อสอบ ครูผู้สอน วิชาเอกสัง... by nawaporn khamsean... 33 views
- 25 quantization and_compression by Amina Byalal 531 views
- A study of EMG based Speech Recogn... by vetrivel D 217 views

7,797 views

Published on

No Downloads

Total views

7,797

On SlideShare

0

From Embeds

0

Number of Embeds

30

Shares

0

Downloads

334

Comments

0

Likes

4

No embeds

No notes for slide

- 1. CHAPTER1 INTRODUCTIONLinear predictive coding (LPC) is a tool used mostly in audio signal processing and speechprocessing for representing the spectral envelope of a digital signal of speech in compressedform, using the information of a linear predictive model. It is one of the most powerfulspeech analysis techniques, and one of the most useful methods for encoding good qualityspeech at a low bit rate and provides extremely accurate estimates of speech parameters. ʊA vocoder (play /ˈvo koʊ dər/, short for voice encoder) is an analysis/synthesis system,used to reproduce human speech. In the encoder, the input is passed through a multibandfilter, each band is passed through an envelope follower, and the control signals from theenvelope followers are communicated to the decoder. The decoder applies these (amplitude)control signals to corresponding filters in the synthesizer. Since the control signals changeonly slowly compared to the original speech waveform, the bandwidth required to transmitspeech can be reduced. This allows more speech channels to share a radio circuit orsubmarine cable. By encoding the control signals, voice transmission can be secured againstinterception.The vocoder was originally developed as a speech coder for telecommunications applicationsin the 1930s, the idea being to code speech for transmission. Transmitting the parameters of aspeech model instead of a digitized representation of the speech waveform saves bandwidthin the communication channel; the parameters of the model change relatively slowly,compared to the changes in the speech waveform that they describe. Its primary use in thisfashion is for secure radio communication, where voice has to be encrypted and thentransmitted. The advantage of this method of "encryption" is that no signal is sent, but ratherenvelopes of the bandpass filters. The receiving unit needs to be set up in the same channelconfiguration to resynthesize a version of the original signal spectrum. The vocoder as bothhardware and software has also been used extensively as an electronic musical instrument. Whereas the vocoder analyzes speech, transforms it into electronically transmittedinformation, and recreates it, The Voder (from Voice Operating Demonstrator) generatessynthesized speech by means of a console with fifteen touch-sensitive keys and a pedal, 1
- 2. basically consisting of the "second half" of the vocoder, but with manual filter controls,needing a highly trained operator.Since the late 1970s, most non-musical vocoders have been implemented using linearprediction, whereby the target signals spectral envelope (formant) is estimated by an all-poleIIRfilter. In linear prediction coding, the all-pole filter replaces the bandpass filter bank of itspredecessor and is used at the encoder to whiten the signal (i.e., flatten the spectrum) andagain at the decoder to re-apply the spectral shape of the target speech signal.1.1 Organization of the project: Chapter 1: Introduction Chapter 2: General theory Chapter 3:Block diagram Description Chapter 4:Software Description Chapter 5:Results and Conclusion 2
- 3. CHAPTER2 GENERAL THEORY2.1 Overview LPC starts with the assumption that a speech signal is produced by a buzzer at the endof a tube (voiced sounds), with occasional added hissing and popping sounds (sibilants andplosive sounds). Although apparently crude, this model is actually a close approximation ofthe reality of speech production. The glottis (the space between the vocal folds) produces thebuzz, which is characterized by its intensity (loudness) and frequency (pitch). The vocal tract(the throat and mouth) forms the tube, which is characterized by its resonances, which giverise to formants, or enhanced frequency bands in the sound produced. Hisses and pops aregenerated by the action of the tongue, lips and throat during sibilants and plosives. LPC analyzes the speech signal by estimating the formants, removing their effectsfrom the speech signal, and estimating the intensity and frequency of the remaining buzz. Theprocess of removing the formants is called inverse filtering, and the remaining signal after thesubtraction of the filtered modeled signal is called the residue. The numbers which describe the intensity and frequency of the buzz, theformants, and the residue signal, can be stored or transmitted somewhere else. LPCsynthesizes the speech signal by reversing the process: use the buzz parameters and theresidue to create a source signal, use the formants to create a filter (which represents thetube), and run the source through the filter, resulting in speech. Because speech signals vary with time, this process is done on short chunks of the speechsignal, which are called frames; generally 30 to 50 frames per second give intelligible speechwith good compression.2.2 LPC coefficient representations LPC is frequently used for transmitting spectral envelope information, and assuch it has to be tolerant of transmission errors. Transmission of the filter coefficients directly(see linear prediction for definition of coefficients) is undesirable, since they are very 3
- 4. sensitive to errors. In other words, a very small error can distort the whole spectrum, orworse, a small error might make the prediction filter unstable. There are more advanced representations such as Log Area Ratios (LAR), linespectral pairs (LSP) decomposition and reflection coefficients. Of these, especially LSPdecomposition has gained popularity, since it ensures stability of the predictor, and spectralerrors are local for small coefficient deviationsLog area ratios (LAR) LAR can be used to represent reflection coefficients (another form forlinear prediction coefficients) for transmission over a channel. While not as efficient as linespectral pairs (LSPs), log area ratios are much simpler to compute. Let be the kthreflection coefficient of a filter, the kth LAR is:Use of Log Area Ratios have now been mostly replaced by Line Spectral Pairs, but oldercodecs, such as GSM-FR use LARs.Line spectral pairs Line spectral pairs (LSP) or line spectral frequencies (LSF) are used to representlinear prediction coefficients (LPC) for transmission over a channel. LSPs have severalproperties (e.g. smaller sensitivity to quantization noise) that make them superior to directquantization of LPCs. For this reason, LSPs are very useful in speech coding.Mathematical foundationThe LP polynomial can be decomposed into: 4
- 5. whereP(z) corresponds to the vocal tract with the glottis closed and Q(z) with the glottisopen.While A(z) has complex roots anywhere within the unit circle (z-transform), P(z) andQ(z) have the very useful property of only having roots on the unit circle, hence P is apalindromic polynomial and Q an antipalindromic polynomial. So to find them we take atest point and evaluate and using a grid ofpoints between 0 and pi. The zeros (roots) of P(z) and Q(z) also happen to be interspersedwhich is why we swap coefficients as we find roots. So the process of finding the LSPfrequencies is basically finding the roots of two polynomials of order p + 1. The roots of P(z)and Q(z) occur in symmetrical pairs at ±w, hence the name Line Spectrum Pairs (LSPs).Because all the roots are complex and two roots are found at 0 and , only p/2 roots need tobe found for each polynomial. The output of the LSP search thus has p roots, hence the samenumber of coefficients as the input LPC filter (not counting ).To convert back to LPCs, we need to evaluate by "clocking"an impulse through it N times (order of the filter), yielding the original filter, A(z).PropertiesLine spectral pairs have several interesting and useful properties. When the roots of P(z) andQ(z) are interleaved, stability of the filter is ensured if and only if the roots are monotonicallyincreasing. Moreover, the closer two roots are, the more resonant the filter is at thecorresponding frequency. Because LSPs are not overly sensitive to quantization noise andstability is easily ensured, LSP are widely used for quantizing LPC filters. Line spectralfrequencies can be interpolated.Reflection coefficient The reflection coefficient is used in physics and electrical engineering when wavepropagation in a medium containing discontinuities is considered. A reflection coefficientdescribes either the amplitude or the intensity of a reflected wave relative to an incidentwave. The reflection coefficient is closely related to the transmission coefficient.2.3 Pitch Period Estimation Determining if a segment is a voiced or unvoiced sound is not all of the information thatis needed by the LPC decoder to accurately reproduce a speech signal. In order to produce aninput signal for the LPC filter the decoder also needs another attribute of the current speech 5
- 6. segment known as the pitch period. The period for any wave, including speech signals, can bedefined as the time required for one wave cycle to completely pass a fixed position. Forspeech signals, the pitch period can be thought of as the period of the vocal cord vibrationthat occurs during the production of voiced speech. Therefore, the pitch period is only neededfor the decoding of voiced segments and is not required for unvoiced segments since they areproduced by turbulent air flow not vocal cord vibrations. It is very computationally intensive to determine the pitch period for a given segmentof speech. There are several different types of algorithms that could be used. One type ofalgorithm takes advantage of the fact that the autocorrelation of a period function, Rxx(k), willhave a maximum when k is equivalent to the pitch period. These algorithms usually detect amaximum value by checking the autocorrelation value against a threshold value. Oneproblem with algorithms that use autocorrelation is that the validity of their results issusceptible to interference as a result of other resonances in the vocal tract. Wheninterference occurs the algorithm cannot guarantee accurateresults. Another problem withautocorrelation algorithms occurs because voiced speech is not entirely periodic. This meansthat the maximum will be lower than it should be for a true periodic signal.2.4 Applications LPC is generally used for speech analysis and resynthesis. It is used as a form of voice compression by phone companies, for example in the GSM standard. It is also used for secure wireless, where voice must be digitized, encrypted and sent over a narrow voice channel; an early example of this is the US governments Navajo I. LPC synthesis can be used to construct vocoders where musical instruments are used as excitation signal to the time-varying filter estimated from a singers speech. This is somewhat popular in electronic music. Paul Lansky made the well-known computer music piece notjustmoreidlechatter using linear predictive coding.A 10th-order LPC was used in the popular 1980s Speak & Spell educational toy. Waveform ROM in some digital sample-based music synthesizers made by Yamaha Corporation may be compressed using the LPC algorithm. LPC predictors are used in Shorten, MPEG-4 ALS, FLAC, and other lossless audio codec‘s. 6
- 7. 2.4.1 Voice effects in music For musical applications, a source of musical sounds is used as the carrier,instead of extracting the fundamental frequency. For instance, one could use the sound of asynthesizer as the input to the filter bank, a technique that became popular in the 1970s.One of the earliest person who recognized the possibility of Vocoder/Voder on the electronicmusic may be Werner Meyer-Eppler, a German physicist/experimentalacoustician/phoneticist. In 1949, he published a thesis on the electronic music and speechsynthesis from the viewpoint of sound synthesis, and in 1951, he joined to the successfulproposal of establishment of WDR Cologne Studio for Electronic Music. Siemens Synthesizer (c.1959) at Siemens Studio for Electronic Music was one of the first attempt to divert vocoder to create music One of the first attempt to divert vocoder to create music may be a ―Siemens Synthesizer‖ at Siemens Studio for Electronic Music, developed between 1956-1959. In 1968, Robert Moog developed one of the first solid-state musical vocoder for electronic music studio of University at Buffalo.In 1969, Bruce Haack built a prototype vocoder, named "Farad" after Michael Faraday, and it was featured on his rock album The Electric Lucifer released in the same year. In 1970 Wendy Carlos and Robert Moog built another musical vocoder, a 10-band device inspired by the vocoder designs of Homer Dudley. It was originally called a spectrum encoder-decoder, and later referred to simply as a vocoder. The carrier signal came from a Moog modular synthesizer, and the modulator from a microphone input. The output of the 10-band vocoder was fairly intelligible, but relied on specially articulated speech. Later improved vocoders use a high-pass filter to let some sibilance through from the microphone; this ruins the device for its original speech-coding application, but it makes the "talking synthesizer" effect much more intelligible. Carlos and Moogs vocoder was featured in several recordings, including the soundtrack to Stanley Kubricks A Clockwork Orange in which the vocoder sang the vocal part of Beethovens "Ninth Symphony". Also featured in the soundtrack was a piece called "Timesteps," which featured the vocoder in two sections. "Timesteps" was originally intended as merely an introduction to vocoders for the "timid listener", 7
- 8. but Kubrick chose to include the piece on the soundtrack, much to the surprise of Wendy Carlos.[citation needed] Kraftwerks Autobahn (1974) was one of the first successful pop/rock albums to feature vocoder vocals. Another of the early songs to feature a vocoder was "The Raven" on the 1976 album Tales of Mystery and Imagination by progressive rock band The Alan Parsons Project; the vocoder also was used on later albums such as I Robot. Following Alan Parsons example, vocoders began to appear in pop music in the late 1970s, for example, on disco recordings. Jeff Lynne of Electric Light Orchestra used the vocoder in several albums such as Time (featuring the Roland VP- 330 PlusMkI). ELO songs such as "Mr. Blue Sky" and "Sweet Talkin Woman" both from Out of the Blue (1977) use the vocoder extensively. Featured on the album are the EMS Vocoder 2000W MkI, and the EMS Vocoder (-System) 2000 (W or B, MkI or II).2.4.2 speaker-dependent word recognition deviceThe speaker-dependent word recognition device is implemented using the MotorolaDSP56303. First the speaker will train the device by storing 10 different vowel sounds intomemory. Then the same speaker can repeat one of the ten words associated with the vowelsound and the device can detect which word was repeated and flag an appropriate output. Vowel Microphone A/D Converter Sound Input Store coefficients Calculate LPC in memory coefficients Fig 2.1: Training the Device Vowel Sound Microphone Input A/D Converter 8
- 9. Compare coefficients Calculate LPC output with the one in coefficients memory Fig 2.2:Word Recognition2.5 Modern vocoder implementations Even with the need to record several frequencies, and the additional unvoiced sounds,the compression of the vocoder system is impressive. Standard speech-recording systemscapture frequencies from about 500 Hz to 3400 Hz, where most of the frequencies used inspeech lie, typically using a sampling rate of 8 kHz (slightly greater than the Nyquist rate).The sampling resolution is typically at least 12 or more bits per sample resolution (16 isstandard), for a final data rate in the range of 96-128 kbit/s. However, a good vocoder canprovide a reasonable good simulation of voice with as little as 2.4 kbit/s of data. Toll Quality voice coders, such as ITU G.729, are used in many telephone networks.G.729 in particular has a final data rate of 8 kbit/s with superb voice quality. G.723 achievesslightly worse quality at data rates of 5.3 kbit/s and 6.4 kbit/s. Many voice systems use evenlower data rates, but below 5 kbit/s voice quality begins to drop rapidly.Several vocoder systems are used in NSA encryption systems: LPC-10, FIPS Pub 137, 2400 bit/s, which uses linear predictive coding Code-excited linear prediction (CELP), 2400 and 4800 bit/s, Federal Standard 1016, used in STU-III Continuously variable slope delta modulation (CVSD), 16 kbit/s, used in wide band encryptors such as the KY-57. Mixed-excitation linear prediction (MELP), MIL STD 3005, 2400 bit/s, used in the Future Narrowband Digital Terminal FNBDT, NSAs 21st century secure telephone. Adaptive Differential Pulse Code Modulation (ADPCM), former ITU-T G.721, 32 kbit/s used in STE secure telephone(ADPCM is not a proper vocoder but rather a waveform codec. ITU has gathered G.721 along with some other ADPCM codec‘s into G.726.) 9
- 10. Vocoders are also currently used in developing psychophysics, linguistics, computational neuroscience and cochlear implant research. Modern vocoders that are used in communication equipment and in voice storage devices today are based on the following algorithms: Algebraic code-excited linear prediction (ACELP 4.7 kbit/s – 24 kbit/s) Mixed-excitation linear prediction (MELPe 2400, 1200 and 600 bit/s) Multi-band excitation (AMBE 2000 bit/s – 9600 bit/s) Sinusoidal-Pulsed Representation (SPR 300 bit/s – 4800 bit/s) Tri-Wave Excited Linear Prediction (TWELP 600 bit/s – 9600 bit/s) CHAPTER 3 BLOCK DIAGRAM 10
- 11. 3.1 Block diagram description:General block diagram (as shown in Fig 3.2) of LPC consists of the blocks A/D Converter End Point Detection Pre-emphasis filter Frame blocking Hamming window Auto-Correlation Levinson-Durbin algorithm Fig 3.1: LPC analysis and synthesis of speech A/D Converter End Point Detection Pre-emphasis filter 11
- 12. Auto-Correlation Frame Blocking Hamming Window Levinson-Durbin Algorithm SSD Comparison Output Fig 3.2: General Block Diagram3.2.1 A/D Converter For Motorola DSP56303, the device converts the analog signals to digital samples byan ASM file called ‗core302.asm‘. The samples are input from CODEC A/D input port asshown in Fig. 5. The assembly file initializes the necessary peripheral settings for general I/Opurposes. Moreover, the file also contains a macro called waitdata. The macro waits for asample and takes the sample in. The sampling rate is set to 8000 samples/second.3.2.2End Point Detection In the end point detection, each sample taken from A/D converter is compared to avolume threshold. If the sample is lower than the threshold, it is considered as backgroundnoise and therefore disregarded. Otherwise, the DSP board, will output 4 bits high to Port Bto indicate readiness to process speech samples, and the next 2000 samples will be stored intoa buffer before processing.3.2.3Pre-emphasis filter The pre-emphasis is a low-order digital filter. The filter has transfer function show inEquation (3.1). H(z) = 1 – 0.9375 z-1 (3.1) The digitized speech signal goes through the filter to average transmission conditions,noise backgrounds, and signal spectrums. The filter boosts up the high frequency componentsof human voice and attenuates the low frequency component of human voice. Because humanvoice typically has higher power at low frequencies, the filter renders the speech sample easyfor LPC calculation.3.2.4 Frame blockingThe pre-emphasized speech samples are divided into 30-ms window frames. Each 30 mswindow frame consists of 240 samples as illustrated in Equation (3.2) and Equation (3.3).(Sampling Rate)(Frame Length) = Number of Samples in a Frame (3.2)(8000 samples/second)(0.030 second) = 240 samples (3.3) 12
- 13. In addition, adjacent window frames are separated by 80 samples (240 x 1/3), with160 overlapping samples. The amount of separation and overlapping depends on framelength. The frame length is chosen according to the sampling rate. The higher the samplingrate, the larger the frame length to be accurate.3.2.5 Hamming window The windowing method involves multiplying the ideal impulse response with awindow function to generate a corresponding filter, which tapers the ideal impulse response.Like the frequency sampling method, the windowing method produces a filter whosefrequency response approximates a desired frequency response. The windowing method,however, tends to produce better results than the frequency sampling method. The toolbox provides two functions for window-based filter design, fwind1 andfwind2. fwind1 designs a two-dimensional filter by using a two-dimensional window that itcreates from one or two one-dimensional windows that you specify. fwind2 designs a two-dimensional filter by using a specified two-dimensional window directly. fwind1 supports two different methods for making the two-dimensional windows it uses: Transforming a single one-dimensional window to create a two-dimensional window that is nearly circularly symmetric, by using a process similar to rotation Creating a rectangular, separable window from two one-dimensional windows, by computing their outer product The example below uses fwind1 to create an 11-by-11 filter from the desired frequencyresponse Hd. The example uses the Signal Processing Toolbox hamming function to create aone-dimensional window, which fwind1 then extends to a two-dimensional window.Hd = zeros(11,11); Hd(4:8,4:8) = 1;[f1,f2] = freqspace(11,meshgrid);mesh(f1,f2,Hd), axis([-1 1 -1 1 0 1.2]), colormap(jet(64))h = fwind1(Hd,hamming(11));figure, freqz2(h,[32 32]), axis([-1 1 -1 1 0 1.2]) 13
- 14. Below images shows Desired Two-Dimensional Frequency Response (left) and Actual Two-Dimensional Frequency Response (right)Fig3.4:Two-Dimensional Frequency ResponseCreating the Desired Frequency Response Matrix The filter design functions fsamp2, fwind1, and fwind2 all create filters based on adesired frequency response magnitude matrix. Frequency response is a mathematical functiondescribing the gain of a filter in response to different input frequencies.3.2.6 Auto-Correlation The Autocorrelation LPC block determines the coefficients of an N-step forwardlinear predictor for the time-series in each length-M input channel, u, by minimizing theprediction error in the least squares sense. A linear predictor is an FIR filter that predicts thenext value in a sequence from the present and past inputs. This technique has applications infilter design, speech coding, spectral analysis, and system identification. The Autocorrelation LPC block can output the prediction error for each channel aspolynomial coefficients, reflection coefficients, or both. It can also output the prediction errorpower for each channel. The input u can be a scalar, unoriented vector, column vector,sample-based row vector, or a matrix. Frame-based row vectors are not valid inputs. Theblock treats all M-by-N matrix inputs as N channels of length M. When you select Inherit prediction order from input dimensions, the prediction order,N, is inherited from the input dimensions. Otherwise, you can use the Prediction 14
- 15. orderparameter to specify the value of N. Note that N must be a scalar with a value less thanthe length of the input channels or the block produces an error. When Output(s) is set to A, port A is enabled. For each channel, port A outputs an(N+1)-by-1 column vector, a = [1 a2a3 ... aN+1]T, containing the coefficients of an Nth-ordermoving average (MA) linear process that predicts the next value, ûM+1, in the input time-series. When Output(s) is set to K, port K is enabled. For each channel, port K outputs alength-N column vector whose elements are the prediction error reflection coefficients. WhenOutput(s)is set to A and K, both port A and K are enabled, and each port outputs its respectiveset of prediction coefficients for each channel. When you select Output prediction error power (P), port P is enabled. The predictionerror power is output at port P as a vector whose length is the number of input channels.3.2.7Levinson-Durbin algorithm Ra = bin the cases where: R is a Hermitian, positive-definite, Toeplitz matrix. b is identical to the first column of R shifted by one element and with the opposite sign.The input to the block, r = [r(1) r(2) ... r(n+1)], can be a vector or a matrix. If the input is amatrix, the block treats each column as an independent channel and solves it separately. Eachchannel of the input contains lags 0 through n of an autocorrelation sequence, which appearin the matrix R. 15
- 16. The block can output the polynomial coefficients, A, the reflection coefficients, K,and the prediction error power, P, in various combinations. The Output(s) parameter allowsyou to enable the A and K outputs by selecting one of the following settings: A — For each channel, port A outputs A=[1 a(2) a(3) ... a(n+1)], the solution to the Levinson-Durbin equation. A has the same dimension as the input. You can also view the elements of each output channel as the coefficients of an nth-order autoregressive (AR) process. K — For each channel, port K outputs K=[k(1) k(2) ... k(n)], which contains n reflection coefficients and has the same dimension as the input, less one element. A scalar input channel causes an error when you select K. You can use reflection coefficients to realize a lattice representation of the AR process described later in this page. A and K — The block outputs both representations at their respective ports. A scalar input channel causes an error when you select A and K. Select the Output prediction error power (P) check box to output the prediction errorpower for each channel, P. For each channel, P represents the power of the output of an FIRfilter with taps A and input autocorrelation described by r, where A represents a predictionerror filter and r is the input to the block. In this case, A is a whitening filter. P has oneelement per input channel. When you select the If the value of lag 0 is zero, A=[1 zeros], K=[zeros], P=0 checkbox (default), an input channel whose r(1) element is zero generates a zero-valued output.When you clear this check box, an input with r(1) = 0 generates NaNs in the output. Ingeneral, an input with r(1) = 0 is invalid because it does not construct a positive-definitematrix R. Often, however, blocks receive zero-valued inputs at the start of a simulation. Thecheck box allows you to avoid propagating NaNs during this period.Applications One application of the Levinson-Durbin formulation implemented by this block is inthe Yule-Walker AR problem, which concerns modeling an unknown system as anautoregressive process. You would model such a process as the output of an all-pole IIR filterwith white Gaussian noise input. In the Yule-Walker problem, the use of the signals 16
- 17. autocorrelation sequence to obtain an optimal estimate leads to anRa = b equation of the typeshown above, which is most efficiently solved by Levinson-Durbin recursion. In this case, theinput to the block represents the autocorrelation sequence, with r(1) being the zero-lag value.The output at the blocks A port then contains the coefficients of the autoregressive processthat optimally models the system. The coefficients are ordered in descending powers of z, andthe AR process is minimum phase. The prediction error, G, defines the gain for the unknownsystem, where : The output at the blocks K port contains the corresponding reflection coefficients,[k(1) k(2) ... k(n)], for the lattice realization of this IIR filter. The Yule-Walker AR Estimatorblock implements this autocorrelation-based method for AR model estimation, while theYule-Walker Method block extends the method to spectral estimation. Another common application of the Levinson-Durbin algorithm is in linear predictivecoding, which is concerned with finding the coefficients of a moving average (MA) process(or FIR filter) that predicts the next value of a signal from the current signal sample and afinite number of past samples. In this case, the input to the block represents the signalsautocorrelation sequence, with r(1) being the zero-lag value, and the output at the blocks Aport contains the coefficients of the predictive MA process (in descending powers of z).These coefficients solve the following optimization problem: Again, the output at the blocks K port contains the corresponding reflectioncoefficients, [k(1) k(2) ... k(n)], for the lattice realization of this FIR filter. The 17
- 18. Autocorrelation LPC block in the Linear Prediction library implements this autocorrelation-based prediction method.3.2.8 Sum of Square of Difference comparison The Sum of Square of Difference comparison is quantitative method to compare twosets of LPC coefficients. Suppose one set of LPC coefficients in the template areA‘1, A‘2, A‘3, …., A‘10, and another set of LPC coefficients obtained from a window frameare A1, A2, A3, …., A‘10. SSD = (A‘1 – A1)2 + (A‘2 – A2)2 + (A‘3 – A3)2 + …. + (A‘10 – A10)2 Each time the window frame is shifted, SSD is calculated between LPC coefficientsfrom the window frame and every set of LPC coefficients in template. A minimum SSDexists between LPC coefficients from a window frame and one set of LPC coefficients intemplate. The one with the minimum SSD value is the closest match to the input vowel. 18
- 19. CHAPTER 4 SOFTWARE DESCRIPTION4.1 MATLAB INTRODUCTION:The name MATLAB stands for MATrix LABoratory. MATLAB was written originallytoprovide easy access to matrix software developed by the LINPACK (linear system package)and EISPACK (Eigen system package) projects. MATLAB is a high-performance languagefor technical computing. It integrates computation, visualization, and programmingenvironment. Furthermore, MATLAB is a modern programming language environment: ithas sophisticated data structures, contains built-in editing and debugging tools, and supportsobject-oriented programming. These factors make MATLAB an excellent tool for teachingand research. MATLAB has many advantages compared to conventional computer languages(e.g.,C, FORTRAN) for solving technical problems. MATLAB is an interactive systemwhose basic data element is an arraythat does not require dimensioning. The softwarepackage has been commercially available since 1984 and is now considered as a standard toolat most universities and industries worldwide. It has powerful built-inroutines that enable avery wide variety of computations. It also has easy to use graphics commands that make thevisualization of results immediately available. Special c applications are collected inpackages referred to astoolbox. There are toolboxes for signal processing, symboliccomputation, control theory, simulation, optimization, and several other fields of appliedscience and engineering.4.2Mathematical functions:MATLAB offers many predefined mathematical functions for technical computingwhichcontains a large set of mathematical functions.Typing help elfun and help specfun callsup full lists of elementaryand special functions respectively.There is a long list ofmathematical functions that are builtinto MATLAB. Thesefunctions are called built-ins.Many standard mathematical functions, such as sin(x), cos(x),tan(x), ex, ln(x), are evaluatedby the functions sin, cos, tan, exp, and log respectively in MATLAB.4.3 Basic plotting:MATLAB has an excellent set of graphic tools. Plotting a given data set or the resultsof computation is possible with very few commands. We are highly encouraged to plot 19
- 20. mathematical functions and results of analysis as often as possible. Trying to understandmathematical equations with graphics is an enjoyable and very efficient way of learningmathematics.4.4 Matrix generation:Matrices are the basic elements of the MATLAB environment. A matrix is a two-dimensionalarray consisting of mrows and ncolumns. Special cases are column vectors (n= 1)and rowvectors(m= 1). MATLAB supports two types of operations, known as matrixoperationsand array operations.MATLAB provides functions that generate elementary matrices. The matrix of zeros,thematrix of ones, and the identity matrix are returned by the functions zeros, ones, andeye,respectively.table 3.1:Elementary matrices4.5 Programming in Matlab:4.5.1 M-File scripts:A script fileis an external file that contains a sequence of MATLAB statements. Script fileshave a filename extension .m and are often called M-files. M-files can be scripts that simplyexecute a series of MATLAB statements, or they can be functionsthat can accept argumentsand can produce one or more outputs.4.5.2 Script side-effects:All variables created in a script file are added to the workspace. This may have undesirableeffects, because: Variables already existing in the workspace may be overwritten. The execution of the script can be affected by the state variables in the workspace.As a result, because scripts have some undesirable side-effects, it is better to code anycomplicated applications using rather function M-file.4.5.3 Input to script Files: 20
- 21. When a script file is executed, the variables that are used in the calculations within the filemust have assigned values. The assignment of a value to a variable can be done in threeways.1. The variable is defined in the script file.2. The variable is defined in the command prompt.3. The variable is entered when the script is executed.4.5.4 Output Commands:MATLAB automatically generates a displaywhen commands are executed. In addition to thisautomatic display, MATLAB has several commands that can be used to generate displays oroutputs. Two commands that are frequently used to generate output are: disp and fprintf. Table for disp and fprint commands4.5.5 Saving output to a File:In addition to displaying output on the screen, the command fprintf can be used for writingoutput to a file. The saved data can subsequently be used by MATLAB or othersoftware‘s.To save the results of some computation to a file in a text format requires thefollowingsteps:1. Open a file using fopen2. Write the output using fprintf3. Close the file using fclose4.6 Debugging M-Files:4.6.1 Introduction:This section introduces general techniques for finding errors in M-files. Debugging istheprocess by which you isolate and fix errors in your program or code.Debugging helps to correct two kind of errors: Syntax errors - For example omitting a parenthesis or misspelling a function name. Run-time errors - Run-time errors are usually apparent and difficult to track down.They produce unexpected results.4.6.2 Debugging process: 21
- 22. We can debug the M-files using the Editor/Debugger as well as using debugging functionsfrom the Command Window. The debugging process consists of Preparing for debugging: MATLAB is relatively easy to learn Setting breakpoints Running an M-file with breakpoints Stepping through an M-file Examining values Correcting problems Ending debugging4.7 Strengths: MATLAB may behave as a calculatoror as a programming language MATLAB combine nicely calculation and graphic plotting MATLAB is interpreted (not compiled ),errors are easy to fix MATLAB is optimized to be relatively fast when performing matrix operations4.8 Weaknesses: MATLAB is not a general purpose programming language such as C, C++, or FORTRAN. MATLAB is designed for scientific computing, and is not well suitable for other applications. MATLAB is an interpreted language, slower than a compiled language such as C++ MATLAB commands are specific for MATLAB usage. Most of them do not have a direct equivalent with other programming language commands. CHAPTER 5 RESULTS AND CONCLUSIONS 22
- 23. 5.1 Result The implementation of a LPC vocoder is really an exciting and challenging matter. A lot of techniques were learned from the literature and practice during this work. Looking to the complexity of the voiced / unvoiced decision in the LPC-10e DoD vocoder, it is clear that a good algorithm must have a lot of intelligence and adaptability in order to get good results. The main problem is the estimation of the pitch. Secondly, a robust voiced / unvoiced decision is very important. Fig 5.1:Screen shot of input 23
- 24. Fig 5.2: Screen shot of output It was found that considering the memory of the LPC filter leads to betterresults. The median filter was not able to give a smooth pitch contour. Sometechniques like avoiding abrupt changes in the pitch value and avoiding double andhalf pitches should be incorporated in order to get better results. 24
- 25. REFERENCESPartha S Malik, MATLAB and SIMULINK 3rd editionStephen J Chapman, MATLAB programming for engineers 2nd editionwww.wikipedia.orgwww.mathworks.com 25

No public clipboards found for this slide

Be the first to comment