Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Digital filter design using VHDL

6,904 views

Published on

Best document you could ever get on the topic
Written in full details from basic to advanced level
Submitted and approved as a Final year project

Published in: Engineering

Digital filter design using VHDL

  1. 1. DIGITAL FILTER DESIGN 1 CONTENTS 1.INTRODUCTION……………………………………………………………...5 2. ELECTRICALFILTER……………………………………………6 3.COMPARISON OF IIR & FIR FILTER…………………………..8 I.BUTTERWORTH FILTER II.ELIPTICAL FILTER III.CHEBYCHEV FILTER 4.EFFECT OF POLES & ZEROES………………………………….11 5.BI-LINEAR TRANSFORMATION……………………………..…12 6.IIR FILTER REALIZATION……………………………………...18 I.DIRECT FILTER REALIZATION II.CASCADE FILTERREALIZATION 7.VHDL:THE LANGUAGE……………………………………..…..23 I.LEVELS OF ABSTRACTION II.BIT PARALLEL ARITHMATIC A.ADDITION & SUBSTRACTION B.MULTIPLICATION III.BIT SERIAL AITHMATIC A.ADDITION & SUBSTRACTION B.MULTIPLICATION C.SHIFT & ADD MULTIPLIERS D.SHIFT & PARALLEL MULTIPLIERS E.LATENCY F.THROUGHPUT 8.IMPLEMENTATION& ANALYSIS OF SUB-BLOCKS…….…37 I.ADDER II.DELAY III.SERIAL-PARALLEL MULTIPLIER IV.BOOTH MULTIPLIER V.MAC 9.IMPLEMENTATION& ANALYSIS OF FIR FILTERS…….….45 I.DIRECT FOR OF REALIZATION A.USING BIT PARALLEL ARITHMATIC
  2. 2. DIGITAL FILTER DESIGN 2 B.USING BIT SERIAL ARITHMATIC C.AREA ANALYSIS D.POWER ANALYSIS II.CASCADE REALIZATION A.USING BIT PARALLEL ARITHMATIC B.USING BIT SERIAL ARITHMATIC C.AREA ANALYSIS D.POWER ANALYIS 10.CONCLUSION………………………………………………….…….76 11.FUTURE PLANS……………………………………………………...77 12.VHDL CODES FOR FIR FILTERS…………………………….…...78 I.USING BIT PARALLEL ARITHMATIC A.4 BIT COUNTER B.BOOTH MULTIPLIER C.16 BIT FULL ADDER D.MULTIPLIER E.SERIAL PARALLEL CONVERTER F.FIR FILTER G.ALU DESIGN II.USING BIT SERIAL ARITHMATIC A.D FLIP-FLOP B.FULL ADDER C.HALF ADDER D.RIGHT SHIFTER E.DELAY F.PIPEINE G.FIR FILTER 13.REFERENCES……………………………………………………………………..100
  3. 3. DIGITAL FILTER DESIGN 3 LIST OF IMAGES FIG1:MAGNITUDE RESPONSE OF BUTTERWORTH FILTER………………………………..8 FIG2:MAGNITUDE RESPONSE OF ELLIPTIC FILTER…............................................................8 FIG3:MAGNITUDE RESPONSE OF CHEBYCHEV FILTER……………………………………9 FIG4:EFFECTS OF POLES & ZEROES………………………………………………………....…9 FIG5:STABLE TRANSFORMATION……………………………………………………………..12 FIG6:IIR FILTER BLOCK………………………………………………………………………….17 FIG7:DIRECT REALIZATION OF IIR FILTER…………………………………………………..19 FIG8:CASCADE REALIZATION OF IIR FILTER………………………………………………..20 FIG9:BIT PARALLEL RIPPLE CARRY ADDER………………………………………………...26 FIG10:MATRIXPRODUCT OF MULTIPLICATION…………………………………………….27 FIG11:ARRAY MULTIPLIER OF TWO’S COMPLEMENT NUMBERS………………………..28 FIG12:BIT SERIAL ADDER & SUBSTRACTOR…………………………………………………29 FIG13:SHIFT & ADD MULTIPLER………………………………………………………………..31 FIG14:S/P MULTIPLIER USINGSHIFT ACUMULATOR………………………………………..33 FIG15:SIMPLIFIED S/P MULTIPLIER…………………………………………………………….33 FIG16:SIMPLIFIED MULTIFIER STRUCTURE…………………………………………………..34 FIG17:LATENCY & THROUGHPUT OF A PROCESSINGELEMENT…………………...……..35 FIG18:INCREASED THROUGHPUT WITHOUT AFFECTINGLATENCY……………………..35 FIG19:TWO INPUT ADDER BLOCK………………………………………………………………36 FIG20:RTL SCHEMATIC OF ADDER BLOCK…............................................................................36 FIG21:OUTPUT RESULT OF ADDER BLOCK……………………………………………………37 FIG22:BIT SERIAL ADDER BLOCK……………………………………………………………….38 FIG23:TEST BENCH WAVEFORM OF BIT-SERIAL ADDER……………………….…...………38 FIG24:BIT-SERIAL & PARALLEL DELAY BLOCK…………………………………..….……....39 FIG25:TEST BENCH WAVEFORM OF DELAY BLOCK………………………….….….……….39 FIG26:S/P MULTIPLIER BLOCK…………………………………………………….….….………40 FIG27:OUTPUT RESULT OF S/P MULTIPLIER……………………………………….….….…...41 FIG28:SIMULATION RESULT OF BOOTHS MULTIPLIER………………….…..….…………...42 FIG29:MAC CIRCUIT…………………………………………………………..….…..…….….……43 FIG30:DIRECT FORM REALIZATIONOF FIR FILTER…………………………….…….………45 FIG31:FIR FILTER DIAGRAM……………………………………...…………………..…..……….46 FIG32:RTL REPRESNTATION OF FIR FILTER………………...………………………..….….….46 FIG33:DIRECT FORM REALIZATIONOF FIR FILTER( BIT PARALLEL)…….…......................47 FIG34:DIRECT FORM REALIZATIONOF FIR FILTER(BIT SERIAL)……………………….….48 FIG35:OUTPUT WAVEFORM OF POWER ANALYSIS OF FIR FILTER(BIT PARALLEL)……………………………………………………………………….….….….……...….53 FIG36:OUTPUT WAVEFORM OF POWER ANALYSIS OF FIR FILTER(BIT SERIAL)……………………………………………………………………………….….….…….…..54 FIG37:CASCADE REALIZATION………………………………………………………….…….….58 FIG38:CASCADER REALIZATION………………………………………………………………….59 FIG39:FIR FILTER CASCADE REALIZATION……………………………………………………..59 FIG40:STRUCTURE OF ARITHMATICOPERATION……………………………………….….….60 FIG41:FIR FILTER CASCADE REALIZATION USINGBIT SERIAL ARITHMATIC….….…….61 FIG42:OUPUT WAVEFORM OF POWERANALYSIS OF LATICE REALIZATIONUSINGBIT PARALLEL ARITHMATIC…………………………………………………………………....….…..64
  4. 4. DIGITAL FILTER DESIGN 4 LIST OF CHARTS CHART 1: DESIGN SUMMARY OF DIRECT FORM REALIZATION OF FIR FILTER USINGBIT PARALLEL ARITHMATIC……………………………………………………………………………..50 CHART 2: DESIGN SUMMARY OF DIRECT FORM REALIZATION OF FIR FILTER USINGBIT SERIAL ARITHMATIC…………………………………………………………………….….….….….51 CHART 3: POWER SUMMARYOF DIRECT FORM REALIZATION…………………….….…….51 CHART 4: DESIGN SUMMARY OF CASCADE FORM REALIZATIONUSINGBIT PARALLEL ARITHMATIC…………………………………………………………………………….….….………61 CHART 5: DESIGN SUMMARY OF CASCADE FORM REALIZATIONUSINGBIT SERIAL ARITHMATIC………………………………………………………………………….….…………….61 CHART 6: POWER SUMMARYOF CASCADE FORM REALIZATION…………………….…….62 CHART 7: DESIGN SUMMARY OF LATTICE REALIZATION OF FIR FILTER USINGBIT PARALLEL ARITHMATIC…………………………………………………………………………….72 CHART 8: DESIGN SUMMARY OF LATTICE REALIZATION OF FIR FILTER USINGBIT SERIAL ARITHMATIC…………………………………………………………………………………72 CHART 9: POWER SUMMARYOF LATTICE REALIZATIONOF FIR FILTER…………………..73
  5. 5. DIGITAL FILTER DESIGN 5 INTRODUCTION Many of today’s electronic applications contain various types of signal processing. This includes systems used for music, radar, sonar, audio, video, and communication. Some of these represent small-volume markets, while others are high-volume consumer products like mobile phones. There are many reasons behind the increased use of digital signal processing compared to its analog counterparts. One is the advent of VLSI, where large complex systems can be manufactured in large quantities at a low cost per unit. Another reason is that the use of digital circuitry removes the need for tuning, which analog circuits generally require. A stringent requirement on communication systems to efficiently utilize limited resources such as bandwidth and transmitter power has led to the use of complex signal processing algorithms that are only practical to implement using digital signal processing (DSP). Typical DSP operations are frequency selective and adaptive filtering, time-frequency transformations, and sample rate changes. The signals to be processed are obtained either from nature itself or from man-made machines. The signal processing is generally aimed at extracting information or to transform the signal into a form more suited for transmission or storage. Signal processing systems often have finite time available to compute the result. Some systems can accept a missed dead-line, while others give an unacceptable error if a deadline is exceeded. The later is called hard real-time systems and are the type of signal processing systems discussed here. The signal processing can be implemented using various signal representations and circuit techniques. The evolution has gone from analog time-continuous signal processing such as passive LC filters, through design and implementation of recursive digital filters using Bit- Serial Arithmetic for time analog circuits such as switched-capacitor filters, to purely digital implementations. Analog and switched-capacitor circuits are, however, still needed for interfacing of analog signals to the digital signal processing system through anti-aliasing filters, A/D and D/A converters, and for systems with very high bandwidths. Theory, design, and implementation of high-performance DSP (sub)systems in terms of throughput, size, and cost are important research and development areas. Also, the increasing use of portable equipment together with the cost of cooling electronic equipment will be a strong incentive to increase the efforts of reducing the power consumption in the DSP (sub)systems. The work presented in this report addresses several important issues in the design of DSP algorithm and hardware co-design with the aim of obtaining efficient architectures with respect to design effort, throughput, chip area, and power consumption; high-speed and low-power consumption in implementation of recursive digital filters.
  6. 6. DIGITAL FILTER DESIGN 6 ELECTRICAL FILTER An Electrical Filter is a system that can be used to Modify, Reshape, or Manipulate the Frequency Spectrum of an Electrical Signal according to some prescribed requirements, viz. Attenuate a selected frequency component, Locate or Isolate a Frequency Component, and so on. Digital filters can be designed using analog design methods by following these steps: 1. Filter specifications are specified in the digital domain. The filter type (highpass, lowpass, etc.) is specified. 2. An equivalent lowpass filter is designed that meets these specifications. 3. The analog lowpass filter is transformed using spectral transformations into the correct type of filter. 4. The analog filter is transformed into a digital filter using a particular mapping. Analog filters: Classical theory for analogue filters operating below about 100MHz is generally based on "lumped parameter" resistors, capacitors, inductors and operational amplifiers (with feedback) which obey LTI differential equations: [ i(t) = Cdv(t)/dt,v(t) = Ldi(t)/dt,v(t)= i(t)R,v0(t)=A vi(t)]. Analysis of such LTI circuits gives a relationship between input x(t) and output y(t) in the form of a differential equation:  )()( )( )()( )( 2 2 2102 2 210  dt txd a dt tdx atxa dt tyd b dt tdy btyb whose system (or transfer) functions is of the form:   M M N N a sbsbsbb sasasaa sH    ... ... 2 210 2 210 This is a ratio of polynomials in ‘s’. The order of the system function is max(N,M). Replacing s by j gives the frequency-response H a (j), where  denotes frequency in radians/second. For values of s with non-negative real parts, H a (s) is the Laplace Transform of the analogue filter’s impulse response ha(t). H(s) may be expressed in terms of its poles and zeros as:            M N a pspsps zszszs ksH    ... ... 21 21
  7. 7. DIGITAL FILTER DESIGN 7 The entire real life signals that are taken as inputs & processed are analog signals. But, in today’s world, all the systems and their components have been digitized. And for their utilization and processing in the digital computers, the analog signals have to be sampled, processed, and reconstructed via the digital system. Thus samplers and digital filter are an integrated part of today’s electrical components. There are many methods for transforming an Analog Signal to a Digital Signal. Some preferred methods are listed below – i. Backward Difference Method, ii. Impulse Invariance iii. Bilinear Transformation iv. Step Invariance, and so on. There is no optimum method. The selection criteria depends on the Sampling Frequency, Highest Frequency Component of the system, etc.
  8. 8. DIGITAL FILTER DESIGN 8 COMPARISON OF IIR AND FIR DIGITAL FILTERS IIR type digital filters have the advantage of being economical in their use of delays, multipliers and adders. They have the disadvantage of being sensitive to coefficient round-off inaccuracies and the effects of overflow in fixed point arithmetic. These effects can lead to instability or serious distortion. Also, an IIR filter cannot be exactly linear phase. FIR filters may be realized by non-recursive structures which are simpler and more convenient for programming especially on devices specifically designed for digital signal processing. These structures are always stable, and because there is no recursion, round-off and overflow errors are easily controlled. A FIR filter can be exactly linear phase. The main disadvantage of FIR filters is that large orders can be required to perform fairly simple filtering tasks. Note the frequency response is the transfer function H(z) evaluated around the unit circle on the Argand diagram of z and since the shape of the transfer function can be determined from the positions of its poles and zeroes, so can be the frequency response. The frequency response can be determined by tracing around the unit circle on the Argand diagram of the z plane:  project poles and zeroes radially to hit the unit circle  poles cause bumps  zeroes cause dips  the closer to the unit circle, the sharper the feature
  9. 9. DIGITAL FILTER DESIGN 9 IIR filters can be designed using different methods. One of the most commonly used is via the reference analog prototype filter. This method is the best for designing all standard types of filters such as low-pass, high-pass, band-pass and band-stop filters. Here is a summary of three continuous time low pass filters: BUTTERWORTHFILTERS Butterworth ensures a flat response in the passband and an adequate rate of rolloff. A good "all rounder," the Butterworth filter is simple to understand and suitable for applications such as audio processing. FIG1: Magnitude response of Butterworth Filters ELLIPTIC FILTERS This filter has equiripple (the same amount of ripple in the passband and stopband). FIG2: Magnitude response of Elliptic Filters
  10. 10. DIGITAL FILTER DESIGN 10 CHEBYCHEV FILTERS The Chebyshev filter has ripple in the passband of the filter. There is also an Inverse Chebyshev analog filter is also known as Chebyshev filter II. Chebyshev-II has ripple in the stopband. FIG3: Magnitude response of Chebychev Filters The z-transform of the transfer function is of great importance for IIR filters. The location of poles in the z plane is used for testing stability of designed IIR filter. The poles of the IIR filter transfer function must be located within the unit circle in order that filter is stable. Figure illustrates zeros and poles of the transfer function of a stable IIR filter in the z plane. Transfer function zeros are denoted by small circles, whereas its poles are denoted by small crosses.
  11. 11. DIGITAL FILTER DESIGN 11 EFFECTS OF THE POLES AND ZEROS OF THE TRANSFER FUNCTION The location of poles and zeros of the transfer function is very important for discrete-time system analyses and synthesis. In order that a discrete-time system is stable, all poles of the discrete-time system transfer function must be located within the unit circle. The location of zeroes doesn’t affect the stabilty of discrete-time systems. Recalling that FIR flters do not have a feedback, which makes them stable. However, this doesn’t apply on IIR filters. Therefore, it is preferable to use bilinear transformation because it always makes filter stable. In the Impulse Invariance method, the derived signal has exactly the same unit-step, impulse, or sinusoid response as for the original analog filter with t=nT. Here aliasing may occur. But if order of filter ‘N’ is high enough, aliasing will be small enough to be acceptable, i.e., within our tolerance.
  12. 12. DIGITAL FILTER DESIGN 12 BILINEAR TRANSFORMATION A transformation T (z) : z → w is called bilinear if it takes the form This type of transformation occurs numerous times in electrical engineering, for example, as dielectric hysteresis, mutual impedance coupling between circuits, transmission line calculations, propagation in a stratified medium, loudspeaker impedance, & many more. A continuous-time (CT) signal must be appropriately band-limited in order to avoid frequency aliasing distortions. Additionally, if the number of time samples used in a particular computation is constrained, the Nyquist approximation may do a poor job of representing the original signal. In the 1960’s a basis expansion was proposed implementing a nonlinear frequency warping between a CT signal and its discrete-time (DT) representation according to the bilinear transform. Since there is a one-to-one relationship between the two frequency domains, this bilinear expansion theoretically avoids both the band-limited requirement and the frequency aliasing distortions associated with Nyquist sampling. Furthermore, the DT expansion coefficients can be obtained using a cascade of first-order analog systems. Modern-day integrated circuit technology has made it practical to compute these coefficients through conventional circuit design techniques. Consequently, the bilinear expansion can be considered as a better procedure in filter designs in various applications. In the Bilinear Transformation technique (BLT), we shall compress the analog frequency scale [0 to ∞] to [0 to 2π] in the digital filter. That is, we will compress an infinite frequency span to a finite span. The philosophy of BLT is the following: If we are given an analog transfer function Ha(s) we can always simulate Ha(s) in a basic Analog Circuit. In the simulation of Ha(s), we require summation, multiplication by a constant and a dynamic element, namely an integrator. What people used to do is to use op-amps for integration and simulate any given transfer function by op-amps only. Multiplication by a constant alpha is either a potentiometer, if alpha is less than 1, or an op-amp if alpha is greater than 1. We can take care of both the plus sign and minus signs by inverting op-amp and non inverting op-amp.
  13. 13. DIGITAL FILTER DESIGN 13 Integration is done by putting a capacitor in the feedback loop and the integration is usually associated with a negative sign before the integral. The basic fact is that if we have an adder, a multiplier and a block of transfer function 1/s which describes an integrator, we can simulate any given analog transfer function. If we simulate the given transfer function by adders, multipliers and integrators, then we can convert that diagram into a digital filter because in a digital filter addition and multiplication are the same and there is no change; the only change is that we shall require a digital integrator. The bilinear transform is defined by which is accomplished by replacing ‘s’ by s-plane to z-plane mapping Here the entire jω axis maps into one complete revolution of the unit circle. (z=eTs maps jω axis into infinite number of revolutions of the unit circle) FIG4:STABLE TRANSFORMATION
  14. 14. DIGITAL FILTER DESIGN 14 PROCEDURE FOR BILINEAR TRANSFORMATION Points: 1) Left half of s-plane mapping to inside of the unit circle in z-plane, i.e., 2) Right half of s-plane mapping to outside of the unit circle in z-plane, i.e., Hence, a causal and stable continuous time system will be mapped to a causal and stable discrete-time system. ..… (1)
  15. 15. DIGITAL FILTER DESIGN 15 Unlike Impulse Invariant Transformation where the relationship was simply ω = ΩT as indicated, in BLT, there is a deviation from linearity because the relation between Ω and ω is nonlinear. This is how an infinite axis is compressed to a finite axis, that is 0 to infinity is compressed to 0 to pi; this phenomenon is called Warping. So frequency scale is warped, which is a disadvantage. We shall do pre-warping or anti-warping so that the effect of warping ultimately is cancelled and we get what we want. So, ω’s are transformed to Ω’s and by the relationship This is pre-warping, that is the digital filter frequencies are pre-warped to analog frequencies. Thus we get the specs on the corresponding analog filter. There is absolutely no aliasing in Bilinear Transformation because the total transfer function is being transformed. In Impulse Invariant Transformation, only poles were transformed. In Impulse Invariant Transformation, it is simply ωs/ωp because the relationship is linear. IIT is an approximation for the BLT relationship because for small theta, tanѲ can be replaced by Ѳ, which gives IIT. For small ω or Ω, IIT and BLT are indistinguishable. Alternatively, if we have an inverse bilinear transform, we can follow these steps: 1. Use the inverse bilinear transform on the filter specifications in the digital domain to produce equivalent specifications in the analog domain. 2. Construct the analog filter transfer functions to meet those specifications. 3. Use the bilinear transform to convert the resultant analog filter into a digital filter. 4. The Inverse Bilinear Transform can be expressed as Z=(1+s)/(1-s)
  16. 16. DIGITAL FILTER DESIGN 16 Two of the well known methods, the impulse invariance method & the matched Z-transform method are conceptually similar to sampling a continuous waveform that we're familiar with. Denoting the inverse Laplace transform by L−1 and the Z transform as Z, both these methods involve calculating the impulse response of the analog filter as a(t)=L−1{A(s)} and sampling a(t) at a sampling interval T that is high enough so as to avoid aliasing. The transfer function of the digital filter is then obtained from the sampled sequence a[n] as Da(z)=Z{a[n]} However, there are key differences between the two. Impulse invariance method: In this method, you expand the analog transfer function as partial fractions as where Cm is some constant and αm are the poles. Mathematically, any transfer function with a numerator of lesser degree than the denominator can be expressed as a sum of partial fractions. Only low-pass filters satisfy this criterion (high- pass and bandpass/bandstop have at least the same degree), and hence impulse invariant method cannot be used to design other filters. Matched Z-transform In this method, instead of splitting the impulse response as partial fractions, you do a simple transform of both the poles and the zeros in a similar manner (matched) as βm→eβmT and αm→eαmT (also stability preserving), giving You can easily see the limitation of both these methods. Impulse invariant is applicable only if your filter is low pass and matched z-transform method is applicable to bandstop and bandpass filters (and high pass up to the Nyquist frequency).
  17. 17. DIGITAL FILTER DESIGN 17 Digital filters designed via bilinear transformation are guaranteed to be stable. However, the accurate values of coefficients are obtained immediately after the implementation of bilinear transformation. On filter realization, it is impossible to represent coefficients without an error. In software digital filter realization (implementation), the resulting coefficients are quantized, which also generates a certain error. Any error made during the quantization of coefficients affects more or less the frequency response, which may further cause the stopband attenuation to decrease.
  18. 18. DIGITAL FILTER DESIGN 18 IIR FILTER REALIZATION IIR filter transfer function can be expressed as:  N is the filter order,  bk the coefficient of non-recursive part of IIR filter  ak the coefficient of feedback of IIR filter. IIR Filter Difference Equation can be expressed as : y[n] = b0x[n] + b1x[n] + …………..+bM-1x[n-(M-1)] - a1y[n-1]-a2y[n-1]-………..-aNy[n-N] The block diagram of IIR filter is as follows : FIG5:BOCK DIAGRAM OF IIR FILTER
  19. 19. DIGITAL FILTER DESIGN 19 DIRECT REALIZATION Direct realization of IIR filters starts with this expression: The first part of the expression refers to non-recursive part and the other refers to recursive part of IIR filter. In IIR filter direct realization, these two parts are separately considered and realized. The realization of non-recursive part of IIR filter is identical to the direct realization of FIR filter. Figure illustrates the block diagram of direct realization of non- recursive part of IIR filter. Realization of non-recursive part of IIR filter is similar to that of recursive part. Figure illustrates the direct realization of the filter recursive part.
  20. 20. DIGITAL FILTER DESIGN 20 As non-recursive and recursive part of IIR filter are separately realized, it doesn’t matter which of them will be used first in filtering process. Direct realization is very convenient for software implementation and this is where it is most commonly used. Some of disadvantages of this realization are the greatest sensitivity to accuracy of realized coefficients (i.e. the largest finite word-length effect), and the greatest complexity due to implementation (i.e. needs most resources). CASCADE REALIZATION Cascade realization structure is the most difficult to obtain from the transfer function (comparing to other realization structures given in this book). It is very convenient for its modular structure and less sensitivity to the accuracy of non-recursive and recursive coefficients realization. On cascade IIR filter realization, a filter is divided into several, mutually independent sections of the first or second order. Since the sections are mutually independent after design process, the finite word-length effect on the accuracy of coefficients, modulation of frequency response and IIR filter stability are separately examined for each section. The analyze is simplified this way.
  21. 21. DIGITAL FILTER DESIGN 21 The IIR filter transfer function is expressed as:  bi are the coefficients of transfer function numerator ;  aj are the coefficients of transfer function denominator;  H0 is a constant;  qi are the zeros of the transfer function;  pj are the poles of the transfer function;  B(z) is the transfer function of non-recursive part;  A(z) is the transfer function of recursive part ;  M is the number of sections in cascade realization structure. Cascade realization requires the given expression to be factorized so that the transfer function is expressed as follows: a[i, k] are the coefficients of recursive part of the i-th IIR filter section; b[i, k] are the coefficients of non-recursive part of the i-th IIR filter section. Figure illustrates a second-order section. FIG6: SECOND ORDER FILTER
  22. 22. DIGITAL FILTER DESIGN 22 The use of direct transpose realization structure reduces necessary number of delay lines and adders as well. Filter dividing in independent sections reduces the sensitivity to the accuracy of quantization coefficients and simplifies analysing the stability of the resulting filter. Besides, the possibility that IIR filter becomes instable after quantization is drastically reduced as the coefficients quantization is performed after dividing filter in sections, so the changes of poles locations are smaller, therefore. Software realization requires M buffer of length 2 or 1. Each section must have its own buffer for saving samples of intermediate signals. Such complexity and needed factorization are two main disadvantages of this realization structure. Figure illustrates the block diagram describing cascade IIR filter structure. FIG7: CASCADE REALIZATION OF IIR FILTER
  23. 23. DIGITAL FILTER DESIGN 23 VHDL - THE LANGUAGE The VHSIC Hardware Description Language (VHDL) is an industry standard language used to describe hardware from the abstract to concrete level. The language not only defines the syntax but also defines very clear simulation semantics for each language construct. Provides extensive range of modelling capabilities, it is possible to quickly assimilate a core subset of the language that is both easy and simple to understand without learning the more complex features.It’s very useful in teaching top-down design. We can design a system at high level & express the algorithm in VHDL. We can then simulate and debug the designs at this level before actually proceeding with detailed logic design. A dataflow level of description offers a combination of the behavioural and structural levels of description. LEVELS OF ABSTRACTION 1) Data Flow level : In this style of modelling the flow of data through the entity is expressed using concurrent signal assignment statements. 2) Structural level : In this style of modelling the entity is described as a set of interconnected statements. 3) Behavioral level : This style of modelling specifies the behavior of an entity as a set of statements that are executed sequentially in the specified order. VHDL utilizes these two types of computational procedure, 1) Bit-Parallel Arithmetic 2) Bit-Serial Arithmetic 1) Inputs to a bit-parallel arithmetic operation are stored in registers. In bit-parallel arithmetic, all bits are conceptually processed at once, i.e. all bits in the inputs are applied in parallel and all of the bits in the output occur simultaneously and the obtained output is stored in the registers. An advantage of bit-parallel arithmetic is that the amount of work performed by a processing element during one clock cycle is relatively large, and the clock frequency can therefore be kept low. It means it has high computational speed.
  24. 24. DIGITAL FILTER DESIGN 24 Disadvantage of bit-parallel arithmetic is that it has high power consumption and chip area as compared to bit-serial arithmetic. 2) In serial arithmetic one bit of the input data is processed in each clock cycle, generally starting with the LSB. Advantage in bit-serial arithmetic is its power consumption. Bit serial digital filters have less power consumption because of serial parallel multiplier. Also it consumes smaller area compared to bit parallel. Disadvantage of bit-serial arithmetic is their design complexity. The design time for the bit-serial system increases due to the higher complexity of timing the bit-serial streams. The potential performance of bit-serial processing elements may be somewhat degraded due to practical problems with high frequency clocking. BIT SERIAL ARITHMETIC & BIT PARALLEL ARITHMETIC Numbers may be described as floating-point or fixed-point numbers. Floating-point numbers use a signed mantissa M and a signed exponent E to represent a number F = M×βE ,where β is the base of the number. Fixed point numbers on the other hand have a fixed exponent with the binary point in the mantissa is always located at the same position, independent of the represented number. The variable exponent in the floating-point number representation enables a large number range, but quantization introduces value-dependent errors which may be troublesome in some algorithms. Most DSP algorithms do not require the increased number range of floating-point numbers if appropriate measures are taken to scale the signal levels in the algorithm. Parasitic oscillations in a system using floating-point arithmetic are in general harder to suppress compared to a system using fixed-point arithmetic. Implementation of fixed-point arithmetic is also less complex compared to floating-point arithmetic, making fixed-point arithmetic the preferred number representation in many cases. Floating-point arithmetic thus becomes slower,
  25. 25. DIGITAL FILTER DESIGN 25 consumes more power, and requires more chip area. Signed fixed-point numbers can be described using various representations. One representation is sign-magnitude representation, where a sign bit denotes the sign of the number, and the rest of the bits denote the magnitude. There is in this case two representations of the number zero that will increase the complexity of the implementation of additions and subtractions. Other representations include one’s- complement, two’s complement, bias, and signed-digit code. Most fixed-point systems use two’s-complements to represent signed fixed point numbers. Signed addition and subtraction are then treated as unsigned addition and subtraction. The most significant bit has a negative weight, while the other bits have positive weight. Two’s- complement representations will be assumed throughout the rest of the text. A number X represented in two’s complement is shown in Eq. (4.1). The number range is here limited to -1 ≤ X < 1. Larger number ranges is achieved by scaling the representation by a factor 2k, where k is the required number of integer bits. ……………………………………………Eq.4.1. One important property of two’s-complement representation is that a sum of numbers can be computed in an arbitrary order. An overflow in an intermediate result can be neglected if the correct sum is within the available number range. This means that the order of the additions is unimportant with regard to overflow, and it is therefore possible to rearrange the order without affecting the final result. There are beside the ordinary binary number representations also redundant representations [8], with multiple representations of a single number. Operations involving comparison of numbers using this type of representation is however often difficult to implement. Some of these redundant representations are easy to convert into ordinary binary numbers, e.g., signed-digit code. Others, like Residue Number systems, are difficult to encode and decode to and from ordinary non redundant binary numbers, but they are efficient for certain operations as long as conversion between the number systems is not required.
  26. 26. DIGITAL FILTER DESIGN 26 The most common operations in DSP algorithms are additions, subtractions, and multiplications. Multiplications with fixed coefficients are common, which enables the designer to simplify the hardware. Such simplifications save resources with a possible speed-up. 1) BIT-PARALLEL ARITHMETIC:- Typically, inputs and outputs to a bit-parallel arithmetic operation are stored in registers. In bit- parallel arithmetic, all bits are conceptually processed at once, i.e., all bits in the inputs are applied in parallel and all of the bits in the output occur simultaneously. However, in practice it is necessary to process them sequentially. An advantage of bit-parallel arithmetic compared to bit-serial arithmetic is that the amount of work performed by a processing element during one clock cycle is relatively large, and the clock frequency can therefore be kept low. ADDITION AND SUBTRACTION:- A sum Z of two numbers X and Y in two’s-complement representation is computed by adding the bits two and two, as shown in Eq. (4.2). Carry values are propagating from least significant bit (LSB) up to the most significant bit (MSB). .Eq.4.2. This can be implemented in parallel using a set of full-adders, which adds the bits on the same significance level including a carry bit from the lower significance level. A straightforward implementation is shown in Fig.4.1. The carry input at the LSB is set to zero, and the carry output from each significance level is connected to the next significance level. The result bit si depends on every input bit of equal or lower significance level. There will therefore be a combinatorial path from LSB through all full-adders to the MSB resulting in a long propagation delay.
  27. 27. DIGITAL FILTER DESIGN 27 FIG:8 Bit-parallel ripple-carry adder. The computation of the result will be sequential in the worst case, starting at LSB and generating carry values up to MSB. Many techniques have been proposed to avoid this problem of long carry propagation paths, e.g., carry look-ahead, carry-save, and carry-select. One common property of these solutions is the increase of resources that are required to speed up the computation compared to the ripple-carry implementation. Unwanted switching in the logic circuits is generated by implementations using the simple full- adder based structure in Fig.4.1, as intermediate incorrect results are computed before the correct carry has arrived to a bit level stage. The number of full-adders, and therefore also the carry propagation path limiting the addition time, is proportional to the data word length Wd. Subtraction is carried out using the same structure as for addition. By using the property that the sign of a two’s complement number is changed by inverting all bits and adding one to the LSB position, the addition is converted into subtraction by inverting the value to be subtracted, and setting the input carry bit at the LSB. MULTIPLICATION:- Binary multiplication may be carried out using a scheme similar to common hand calculation. An array of partial-product terms is generated and then added as shown in Fig. 4.2. Each dot in
  28. 28. DIGITAL FILTER DESIGN 28 the summation array corresponds to two digits multiplied, and this is in the binary case equivalent with a logic AND function of two bits. FIG 9: Matrix of partial products generated in multiplication. Summation of the partial-products can be performed in various ways [8]. The straightforward method of using a full-adder for the addition of each dot will result in the array multiplier shown in Fig. 4.3, with a multiplication time proportional to the sum of the data word length and coefficient FIG10: Array multiplier for two’s-complement numbers.
  29. 29. DIGITAL FILTER DESIGN 29 word length (propagating down and then from right to left). The required area will be proportional to the data word length and the coefficient word length (Wd×Wc). Other methods of adding the partial product terms include Wallace trees and similar structures, where the carry propagation is reduced by changing the addition order of the input data [10]. Such addition schemes use a treelike adder structure to speed up the additions, thereby reducing the propagation delay. Carry is only propagated from one level to another, resulting in short combinatorial paths. Only the final step, where the two last intermediate results are to be added, requires a carry-propagate adder. 2) BIT-SERIAL ARITHMETIC:- In bit-serial arithmetic one bit of the input data is processed in each clock cycle, generally starting with the LSB. The complexity of an operation is low as there are few input bits to operate on in each clock cycle. Combinatorial paths through the logic are short, allowing for high bit-rates, which will make the total computation time comparable to bit-parallel ripple carry implementations. Using of bit-serial arithmetic results in small processing elements and short interconnection paths between the processing elements. The total chip area therefore becomes smaller which makes the interconnection between the processing elements shorter. This allows for higher clock frequency and also reduces the power consumption as the capacitive loads on the gates are reduced. ADDITION AND SUBTRACTION:- A bit-serial adder adds two bits during one clock cycle generating a sum bit. A carry bit is also generated which is added in the next clock cycle, as shown to the left in Fig.4.4. The carry is
  30. 30. DIGITAL FILTER DESIGN 30 saved in a flip-flop, which is reset at the start of the addition. This reset of the D flip-flop corresponds to the zero at the LSB carry input in the bit-parallel case. FIG 11: Bit-serial adder and sub tractor. The area of the adder is independent on the data word length, but the number of clock cycles is proportional to the word length. Power consumption is lower in the bit-serial case compared to a long bit-parallel ripple carry implementation because the combinatorial depth of the circuit is smaller, and the output is correctly computed directly without excessive switching. Subtraction may be implemented as in the bit-parallel case, i.e., by changing the sign of the subtrahend. This is accomplished by inverting all bits and adding a one to the LSB position. In the bit-serial case an adder with one inverted input is sufficient to implement the subtraction as shown to the right in Fig. 4.4. The carry flip-flop is set at the beginning of the subtraction. MULTIPLICATION:- Multiplication of two numbers can be accomplished using two bit-serial inputs, generating a bit- serial output [1]. Many DSP algorithms like digital filters and FFTs only use multiplications of data and a fixed coefficient. We will only discuss this type of multiplication here.
  31. 31. DIGITAL FILTER DESIGN 31 SHIFT-AND-ADD MULTIPLIERS:- A common case is multiplication with a fixed coefficient, which may be realized as multiplication of a bit-serial input of Wd bits with a bit parallel coefficient of Wc bits, generating a bit-serial output of Wd+Wc-1 bits. Both the input and output bit-serial data streams are in a LSB first order. This shift-and-add multiplier structure computes the product by adding rows in the matrix representation, generating a new row of bits after each addition. The input stage to a shift-and-add multiplier consists of a row of AND gates which performs a bit-wise multiplication of the serial input bit with the parallel coefficient. This stage is in Fig.4.5 implemented as a multiplexer that selects either the coefficient a or zero. The result of this bit-wise multiplication is then added to the partial product. The accumulated sum is then shifted right one position. The rightmost bit yields one bit in the product. Once the last addition is completed, the multiplier is clocked for additional Wc clock cycles with a zero input in order to shift out the Wc most significant number of bits. FIG 12: Shift-and-add multiplier. Use of a coefficient in two’s-complement form requires the shifting to be done using arithmetic shifts, copying the sign-bit, as the intermediate result may be negative. Serial input data in two’s-complement form requires a special treatment compared to positive binary data. The last bit, the sign-bit, has a bit weight of -1. The sign-bit should therefore be
  32. 32. DIGITAL FILTER DESIGN 32 multiplied with the coefficient and the resulting partial product should then be subtracted from the accumulated sum. One approach is to include logic that convert the addition of (x0×a) to a subtraction. Finally, the last Wc bits are generated while keeping the serial input to zero. Another approach to handle the sign-bit is to sign-extend the serial input [4]. The subtraction in the multiplication of two’s-complement numbers may be eliminated by sign-extending the serial input as shown in Eq. (4.3). The left part of the last expression is the subtraction operation of the coefficient. It only contributes to the product in bit-positions with bit-weights above 20. The right part of the last expression only contributes to the product at bit-positions with bit-weights up to 20. The final subtraction is therefore not required and the multiplier can therefore be implemented using only additions. The sign-extension logic may consist of a latch. ……………………….Eq.4.3 The multiplication time in a serial/parallel multiplier is Wd+Wc-1 clock cycles, where Wd is the bit-serial data word length and Wc is the coefficient word length. The maximum clock frequency will be limited by the addition time in one bit-adder. Only the least significant bit is used as output at each clock cycle, allowing the rest of the intermediate result to be in an arbitrary number format. A redundant representation of the intermediate result is therefore acceptable, as long as the LSB is calculated. Use of carry-save adders will therefore allow for a high clock frequency since they have a short combinatorial path. Shifting is automatically performed each clock cycle due to the wiring.
  33. 33. DIGITAL FILTER DESIGN 33 SERIAL/PARALLEL MULTIPLIERS:- An alternative realization of the shift-and-add algorithm is shown in Fig.4.6. This realization is referred to as a serial/parallel (S/P) multiplier [4]. It consists of two parts. The first part generates the partial bit-products and the second part is a so-called shift-accumulator. A serial/parallel multiplier requires little chip area and can be clocked with high clock frequency [11]. Serial/parallel multipliers are natural building blocks for more complex operations. For example, a processing element corresponding to a two-port adaptor, can be built using a single multiplier, three bit-serial adders, and a number of D flip-flops. Several implementations of digital filters with multiplexed processing elements of this type have successfully been implemented using both standard-cell and full-custom layout styles [9, 12, 6]. FIG 13: Serial/parallel multiplier using a shift-accumulator. S/P MULTIPLICATION WITH FIXED COEFFICIENTS:- The serial/parallel multiplier structure may be significantly simplified if the coefficient is fixed [5]. The number of full-adders in a simplified implementation is equal to the number of non-zero bits in the coefficient minus one if the coefficient is positive, and the number of non-zero bits in the case of a negative coefficient. Procedures for simplifying serial/parallel multipliers with fixed coefficients, either in two’s-complement or signed digit code, is presented in [2]. An example of a simplified serial/parallel multiplier is shown in Fig.4.7. Here, the logic drawn with dotted lines can be removed.
  34. 34. DIGITAL FILTER DESIGN 34 FIG 14: Simplified serial/parallel multiplier with coefficient 0.0112. Multiplication generates a product that has a larger data word length than the word length of the serial input. The number of fractional bits in the coefficient determines the number of extra bits (of lower significance level compared to the input data). These additional bits must be truncated/ rounded in a recursive path. LATENCY:- The computational speed is characterized by two parameters, latency and throughput. The latency for an operation is defined as the time required for an input of a given significance level to affect the output at the same significance level [7, 3, 4]. It describes how long time it takes for an input value to be transformed into an output value. It is often convenient to measure the latency in terms of clock cycles instead of real time unit. Latency depends on the function of the processing element (PE). One example is the simplified serial/parallel multiplier in Fig.4.8, which may be used in multiplication with 0.112 or 0.0112, without changing the structure. The latency is, however, different in the two cases since the multiplication with 0.0112 will generate one more fractional bit compared to the multiplication with 0.112. The 0.0112 case will therefore require one more clock cycle before a result bit of the same significance level is available at the output.
  35. 35. DIGITAL FILTER DESIGN 35 FIG 15: Simplified multiplier structures for fixed coefficients 0.112 and 0.0112. THROUGHPUT:- Throughput is defined as the reciprocal of the time between successive outputs as illustrated in Fig.4.9 [7, 3, 4]. The throughput is measured in operations per time unit. FIG 16: Latency and throughput of a processing element. Throughput is not directly connected to the latency and it is possible to modify the throughput without affecting the latency of a system. This is illustrated in Fig. 4.10, which describes how the throughput of a system consisting of a single multiplier may be doubled by interleaving of two multipliers. However, the latency has not changed.
  36. 36. DIGITAL FILTER DESIGN 36 FIG 17: Increased throughput without affecting latency. Upper and lower limits on throughput and latency will depend on the technology used for the implementation.
  37. 37. DIGITAL FILTER DESIGN 37 IMPLEMENTATION & ANALYSIS OF SUB-BLOCKS A filter consists of various sub blocks like Adder, Multiplier and Delay etc. So to design filters it is necessary to design all this sub blocks first then by combining these sub blocks as per requirement filters can be designed. This chapter provides information about design, implementation and analysis of various sub blocks which are required for filter design. IMPLEMENTATIONOF ADDER SUB-BLOCKS USING VHDL:- Fig illustrates a block diagram of 15-bit fixed-point adder sub-block. FIG 18: Two input Adder block. To design a 15 bit full adder, first a single bit three input adder is created .by port mapping the ports of this three input adder block 15 bit full adder is created. This generic VHDL code of 15 bit full adder is used as a library component. For the three input adder block, the carry output of present state is feed back as the carry input of previous state, which is shown in fig.5.1.2. FIG 19: RTL schematic of adder block.
  38. 38. DIGITAL FILTER DESIGN 38 Fig shows the output result of 15 bits adder block, where ‘a’ and ‘b’ are 15 bit input vectors. The output vector is stored in ‘yout’ variable, which is the sum of input ‘a’ and ‘b’. By the same way 32 bit and 64 bit adder blocks are created. These adder blocks are used in bit parallel implementation of digital filters. FIG 20: output result of two input adder block. Fig shows the block diagram of bit serial adder .To implement this adder we need the memory block to store the sum and carry, for that we use D flip flops. In this adder circuit carry output of present state is feed back as input to the previous state. Here the reset bit is used to reset the output. Output is not available in the output port until the set bit is in on state, which is shown in fig.5.1.5.
  39. 39. DIGITAL FILTER DESIGN 39 FIG 21: BIT serial adder block. FIG 22: Test bench waveform of bit serial adder. IMPLEMENTATIONOF DELAY SUB-BLOCKS USING VHDL:- Fig Show the bit serial and bit parallel implementation of delay sub blocks. This delay sub blocks are used as a memory element to store the data up to one clock cycle. Here the reset bit is used to reset the output.
  40. 40. DIGITAL FILTER DESIGN 40 FIG 23: Bit serial delay block. FIG 24: Bit parallel delay block. Inputs are given at the rising edge of the clock pulses and based on that same output is obtained after a delay of one clock pulse. This memory block behaves like a D flip flop. The output is shown in fig.5.2.3. which we can get after a delay of one clock pulse from the given input. According to this figure variable‘d’ is input vector and variable ‘q’ is output vector. The input vector‘d’ appear in the time slot of 90ns to 180ns. The output vector ‘q’ which appear in the time slot immediately after the first rising edge of clock, (that is 180 ns to 360 ns).This memory block hold the output for at least one clock cycle. FIG 25: Test bench wave form of delay block.
  41. 41. DIGITAL FILTER DESIGN 41 IMPLEMENTATIONOF MULTIPLIER SUB-BLOCKS USING VHDL:- There is different way of designing multiplier. Here two of such design method has been discussed. SERIAL PARALLEL MULTIPLIER SUB-BLOCK USING VHDL:- Fig shows the RTL schematic of a serial parallel multiplier. One of the input vector ‘a’ is applied serially to the circuit (one bit at a time starting from the LSB), while the other ‘b’ is applied parallel.(all bit simultaneously).Say that ‘a’ has M bit while ‘b’ has N. Then after all M bit of ‘a’ have been presented to the system a string of M ‘0’s must follows , in order to complete M+N bit output product. As can be seen in fig that the system is pipelined and constructed using And gates full Adder units and Registers. Each unit of the pipe line (except the left most one) requires one Adder two Registers an And gate to compute one of the inputs. FIG 26: serial parallel multiplier. Simulation results are shown in Fig ‘a=1100’(decimal 12 ) was applied to the serial input. Notice that this input must start with the LSB (a(0)=’0’), which appear in the time slot of 50ns to 100ns.while the MSB(a(3)=’1’)is situated in 350ns to 400ns.Recall that four zeros must then follow. On the other hand at the parallel input, b=’1101’(decimal 13)was applied. The expected result ‘prod=10011100’(decimal 156) can be observed in the lower plot. Recall that the first bit out is the LSB, that is ‘prod(0)=0’,which appear in the time slot immediately after the first rising
  42. 42. DIGITAL FILTER DESIGN 42 edge of clock,(that is 100ns to 200 ns).while the last bit (MSB)of prod is situated in 600 ns to 700 ns. This kind of serial parallel multiplier is used as multiplier in bit serial arithmetic. FIG 27: Simulation result of serial parallel multiplier. BOOTHMULTIPLIER:- Booth multiplication algorithm for radix 4 One of the solutions of realizing high speed multipliers is to enhance parallelism which helps to decrease the number of subsequent calculation stages. The original version of the Booth algorithm (Radix-2) had two drawbacks. They are: (i) the number of add subtract operations and the number of shift operations become variable and become in convenient in designing parallel multipliers. (ii) The algorithm becomes inefficient when there are isolated 1’s. These problems are overcome by using modified Radix4 Booth algorithm which scans strings of three bits with the algorithm given below: 1) Extend the sign bit 1 position if necessary to ensure that n is even. 2) Append a 0 to the right of the LSB of the multiplier. 3) According to the value of each vector, each Partial Product will he 0, +y, -y, +2y or -2y.
  43. 43. DIGITAL FILTER DESIGN 43 The negative values of y are made by taking the 2’s complement and in this paper Carry-look- ahead (CLA) fast adders are used. The multiplication of y is done by shifting y by one bit to the left. Thus, in any case, in designing a n-bit parallel multipliers, only n/2 partial products are generated. FIG 28: simulation result for booths multiplier. Boots multiplier is used in bit parallel arithmetic. The output result of boots multiplier is shown Fig MULTIPLY ACCUMULATE SUB-BLOCKS USING VHDL:- Multiplication followed by accumulation is a common operation in many digital system, particularly those highly interconnected, like digital filters neural networks, data quantizes, etc.
  44. 44. DIGITAL FILTER DESIGN 44 FIG 29: MAC circuit. One typical MAC (Multiply-Accumulate) architecture is illustrated in Fig.29 It consist of multiplying two values, then adding the result to the previous accumulated value, which must then be restored in the register for future accumulations. Another feature of MAC circuits is that it must check for overflow, which might happened when the no of MAC operation is large. The design can be done using components, because we have designed each of the units shown in Fig. However it is relatively simple circuit, it can also be designed directly. In any case, the MAC circuit, as a whole, can be used as a component in application like digital filters and neural networks.
  45. 45. DIGITAL FILTER DESIGN 45 IMPLEMENTATION & ANALYSIS OF FIR FILTERS Digital signal processing finds innumerable applications in the field of audio, video and communications. Such application is generally based on LTI (linear time invariant) systems, which can be implemented with digital circuitry. An LTI system is represented by following equation: Where Ak and Bk are the filter coefficient and x[n-k],y[n-k] are the current (for k=0) and earlier (for k>0) input and output values ,respectively. To implement this expression, register are necessary to store x[n-k] and or y[n-k] (for k>0),beside multiplication and adders , which are well known building block in the digital domain. The impulse response of digital filter can be divided in to two categories: IIR (infinite impulse response) and FIR (Finite impulse response). The former correspond to general case described by the equation above, while the latter occurs when N=0. Only FIR Filter can exhibits linear phase, so they are indispensable when linear phase are required, like in many telecom applications. With N=0, the equation above becomes Where ck = bk/a0 are the coefficient of FIR filter .This equation can be obtained by the system of Fig Where D (delay) represented a register (flip flops), a triangle is a multiplier, and a circle means an adder.
  46. 46. DIGITAL FILTER DESIGN 46 TRANSVERSAL STRUCTURE OR DIRECT FORM REALIZATION OF FIR FILTER:- The system function of FIR filter can be written as H (Z) =∑ h (n)z-n for n=0 to N-1. =h(0) + h(1) z-1 +h(2)z-2 …….+h(N-1)z-(N-1) …………………….Eq.6.1.1. Y(Z)=h(0)X(Z)+ h(1)z-1X(Z)+ h(2)z-2X(Z)+ ……. h(N-1)z-(n-1) X(Z) This equation is realized in FigThis is known as transversal structure. This structure requires N multipliers, N-1 adders, and N-1 delay elements. FIG 30: Transversal structure or Direct form realization on FIR Filter (with five coefficients). An equivalent RTL representation is shown in Fig.6.1.3. As shown the values of ‘x’ are stored on shift register, whose output are connected to the multipliers and then to the adders. The coefficient must be stored on chip. However if the coefficient are always same, their value can be implemented by means of logic gates rather than registers. On the other hand if it is general
  47. 47. DIGITAL FILTER DESIGN 47 purpose filter, then register are required for the coefficients. In the architecture of Fig the output vector ‘y’ was always stored, in order to provide a clean synchronous output. FIG 31: FIR Filter diagram (with four coefficients) FIG 32: RTL representation of FIR Filter The circuit of Fig can be constructed in many ways. However, if it is intended for future reuse or sharing, then it should as generic as possible. The lower section of the filter contains a MAC (multiply Accumulate) pipeline. This circuit is closely related to MAC circuit discussed previously. Here to, over flow can happen, so add /truncate procedure must be included in the design. In this circuit the random coefficient are chosen as constants. No algorithm is used to generate coefficients. The value chosen are coeff(0)=3,coeff(1)= 9,coeff(2)=6,coeff(3)=13. Simulation results are shown in Fig
  48. 48. DIGITAL FILTER DESIGN 48 TRANSVERSAL STRUCTURE OR DIRECT FORM REALIZATION OF FIR FILTER USING BIT PARALLEL ARITHMETIC:- FIG 33: FIR Filter Direct form realization using bit parallel arithmetic. Fig shows output result of Direct form realization of FIR Filter using bit parallel arithmetic. Here 8 bit input vector ‘x’ is feed parallel with the rising edge of the clock pulse. Recall that the coefficients are coeff(0)=3,coeff(1)=9,coeff(2)=6,coeff(3)=13. The sequence applied to the input were x[0]=4, x[1]=3,x[2]=5,x[3]=2.Therefore, with all the flip flops previously reset, at the first positive edge of the clock the expected output is y[0]=coef(0)* x[0]=12, which coincides with the first result of the output for ‘y’ in Fig.3.At the next upward transition of the clock, the expected value of y[1]= coef(0)*x[1]+ coef(1)*x[0]=45 .And one clock cycle later Y[2]= coef(0)* x[2] +coef(1)* x[1] +coef(2)* x[0]=66 , and so on.
  49. 49. DIGITAL FILTER DESIGN 49 TRANSVERSAL STRUCTURE OR DIRECT FORM REALIZATION OF FIR FILTER USING BIT SERIAL ARITHMETIC:- Fig shows output result of Direct form realization of FIR Filter using bit serial arithmetic. Here five single bit input from ‘x0’ to ‘x4’ are used which are feed with the rising edge of the clock pulse. As the data are feed serially so the LSB is applied first, which appear in the time slot of 50ns to 100ns.The sequence applied to the input were x(0)=4, x(1)=3,x(2)=5,x(3)=2.Therefore, with all the flip flops previously reset, at the first positive edge of the clock the expected output is y(0)=coef(0)* x(0)=12. In the output LSB will appear first, which will appear in the time slot immediately after the first rising edge of clock, (that is 50ns to 100 ns).while the last bit (MSB) of ‘y0’ is situated in 200 to 250 ns. The expected value of y1= coef(0)*x1+ coef(1)*x0=45 .Here one addition operation take place and we know that for each bit serial addition operation output will be delayed by one clock pulse. So the output of ‘y1’will appear after a delay of one extra clock pulse from the output ‘y0’. That means the LSB of ‘y1’ will appear at the time slot of 100ns to 150 ns. So there will be an initial latency of one clock pulse. This trends will be followed in ‘y2’ also, the LSB for ‘y2’ will appear at the time slot of 150ns to 200 ns. So there will be an initial latency of two clock pulse. In the next output there will be an initial latency of three clock pulse and these trends will go on for other outputs also. FIG 34: FIR Filter Direct form realization using bit serial arithmetic
  50. 50. DIGITAL FILTER DESIGN 50 SIMULATION TIME ANALYSIS OF TRANSVERSAL STRUCTURE OR DIRECT FORM REALIZATION FILTER USING BIT PARALLEL ARITHMETIC & BIT SERIAL ARITHMETIC:- In bit serial arithmetic the data are feed serially, first the LSB is given then in the next clock pulse second bit is given. In this way the data of all input variables are feed and we get the output in the same fashion. This way of entering input data and extracting output data will introduced latency in the output waveform. As the latency is go on increasing in every individual output, so it will take time for the last bit (MSB) of each output to appear in the waveform. The last bit (MSB) of output ‘y0’ is situated in 200 to 250 ns and the last bit (MSB) of ‘y1’ is situated in 450 to 500 ns. In case of bit parallel arithmetic although there is initial latency in each output but the output bit of a stage (y15 to y0) appears in synchronous with clock pulse. So we get LSB to MSB output data of a stage within a single clock pulse. Bit parallel arithmetic of Fig.6.1.4 shows Y[2]= coef(0)* x[2] +coef(1)* x[1] +coef(2)* x[0]=66 is situated in 350ns to 450 ns. And bit serial arithmetic of Fig.6.1.5 shows Y[2]=66 is situated in 200ns to 600 ns(LSB at 200ns and MSB at 600ns ). So in bit serial arithmetic LSB to MSB of the output is situated in different clock pulse which is not the case of bit parallel arithmetic. For this reason if some one use bit serial arithmetic, it will take time to get the complete output data compare to bit parallel arithmetic. This is one such disadvantage of using bit serial arithmetic compare to bit parallel arithmetic.
  51. 51. DIGITAL FILTER DESIGN 51 AREA ANALYSIS OF TRANSVERSALSTRUCTURE OR DIRECT FORM REALIZATION OF FIR FILTER USING BIT PARALLEL ARITHMETIC & BIT SERIAL ARITHMETIC:- According to design summery of Direct form realization of FIR filter, which use bit parallel arithmetic in fig Number of 4 input LUTs are 479, number of occupied slices are 271, number of bonded INPUT/OUTPUT are 26. Fig.6.1.7. shows the design summery of Direct form realization of FIR filter which use bit serial arithmetic. According to this figure the number of 4 input LUTs are 73, number of occupied slices are 47, number of bonded INPUT/OUTPUT are 13. If a comparison is made between these two design summery, then it is found that bit parallel arithmetic realization have used more number of 4 input LUTs, more number of occupied slices, more number of bonded INPUT/OUTPUT compared to bit serial realization. Extra number of LUTs used are (479-73) =406, extra number of occupied slices are (271-47) =224, extra number of bonded INPUT/OUTPUT are (26-13) =13. Number of Slices Flip Flops 34 Number of 4 input LUTs 479 Number of occupied slices 271 Number of bonded INPUT/OUTPUT 26 Chart 1: Design summary of Direct form realization of FIR Filter using bit parallel arithmetic. From this comparison it is found that bit parallel implementation of Direct form realization will need more chip area compared to bit serial implementation. As the modern electronics devices become smaller and smaller so chip area is an important design parameter for any electronics
  52. 52. DIGITAL FILTER DESIGN 52 circuits. If the design is considered in terms of chip area, then bit serial implementation of this digital Filter is advantageous compared to the bit parallel implementation of digital filters. Power consumption in the circuits is also related to the chip area. If the chip area is increased then Power consumption will also increased in the circuits as well. Number of Slices Flip Flops 78 Number of 4 input LUTs 73 Number of occupied slices 47 Number of bonded INPUT/OUTPUT 13 Chart 2: Design summary of Direct form realization of FIR Filter using bit serial arithmetic. POWER ANALYSIS OF TRANSVERSALSTRUCTURE OR DIRECT FORM REALIZATION OF FIR FILTER USING BIT PARALLEL ARITHMETIC & BIT SERIAL ARITHMETIC:- Comparative study on total estimated power consumption for direct form realization of FIR filter reveals that, bit parallel arithmetic representation of FIR filter consume more power compare to bit serial arithmetic representation. Fig.6.1.8 shows the data of xpower analysis of direct form realization of FIR filters by using Xilinx tool. Which tell that direct form realization of FIR filter using bit serial arithmetic will consume 0.084 watt power while the same filter produced by using bit parallel arithmetic will consume 0.090 watt power in the circuitry.
  53. 53. DIGITAL FILTER DESIGN 53 Total estimated power consumption in Watt Direct form realization of FIR filter using bit parallel arithmetic Direct form realization of FIR filter using bit serial arithmetic 0.090w 0.084w Chart 3: Power summary of Direct form realization of FIR filters. According to first wave form of Fig.6.1.9.total power consumption is the sum of quiescent power, logic power, IO power & digital clock manager power. Where quiescent power (also called static power) is the power drawn by the device when it is powered up, configured with user logic and there is no switching activity. In XPower Analyzer, the value reported for Total Quiescent Power is composed of these quiescent power components:  Device static power – This represents power consumed by the device when it is powered up without programming the user logic. The main contributor to this number is the junction temperature. Any change affecting the device operating environment will affect this power.  Design static power – This represents the power consumed by the user logic when the device is programmed and without any switching activity. For instance, depending on the device family and resource configuration, some blocks used in a design (such as clock management, I/Os, and Multi-Gigabit Transceivers) will consume a set amount of power regardless of activity. The Logic power is used to account for the number of CLB resources, including LUTs, SRLs, LUT-based RAMs, and flip-flops estimated for use in the design. By implementing the pre- existing blocks that constitute a design, it is possible to accurately estimate resource utilization for the bulk of a design. These resource utilization estimates help to predict the logic power, which is typically the larger share of the dynamic power consumed in any design.
  54. 54. DIGITAL FILTER DESIGN 54 With higher switching speeds and capacitive loads, switching I/O power can be a substantial part of the total power consumption of an FPGA. Because of this, it is important to accurately define all I/O related parameters in order to measure IO power. The Digital Clock Manager (DCM) primitive in Xilinx FPGA parts is used to implement e.g. delay locked loop, digital frequency synthesizer, digital phase shifter, or a digital spread spectrum. The digital clock manager module is a wrapper around the DCM primitive which allows it to be used in the EDK tool suite. FIG 35: Output wave form of power analysis of FIR filter (Direct form) using bit parallel arithmetic. In bit parallel arithmetic more no of input output ports are used compared to bit serial arithmetic. The first waveform of Fig and Fig reveals that bit parallel arithmetic representation consume more power because of higher input output ports compared to bit serial arithmetic representation.
  55. 55. DIGITAL FILTER DESIGN 55 Junction temperature plays an important part in measuring the device static power .Small change in junction temperature will radically change the device power consumption. The third and fourth waveform of Fig and Fig provides the information of changes of power with the change in junction temperature for bit parallel and bit serial arithmetic. From this analysis we come to know if power is considered as one of the design criteria, then it is better to design direct form FIR filters by using bit serial arithmetic. Above results reveals that direct form realization of FIR filters using bit serial arithmetic consume less power compared to bit parallel arithmetic representation. FIG 36: Output wave form of power analysis of FIR filter (direct form) using bit serial arithmetic
  56. 56. DIGITAL FILTER DESIGN 56 CASCADE REALIZATION OF FIR FILTER:- The Eq. No of transversal structure can be realized in Cascade form from factored form of H(Z) for N odd value. H(Z) = ∏ (bk0 +bk1z-1 +bk2z-2) for k=1 to (N-1)/2. = (b10 +b11z-1 +b12z-2) (b20 +b21z-1 +b22z-2)……(b((N-1)/2)0 +b((N-1)/2)1 z-1 +b((N-1)/2)2z-2) ………………………………………………………………………………….Eq.6.2.1. For N odd, N-1 will be even and H(Z) will have (N-1)/2 second order factors. Each second order factored form of H (Z) is realized in direct form and in Cascaded to realize H(Z) as shown in Fig.6.2.1. FIG 37: Cascade realization of Eq.6.2.1. For N even H(Z) = (1+ b10z-1)∏ (bk0 +bk1z-1 +bk2z-2) for k=2 to N/2 ………….Eq.6.2.2. When N is even, N-1 is odd and H(Z) will have one first order factor and (N-2)/2 second order factors. H(Z) = (1+ b10z-1) (b20 +b21z-1 +b22z-2) (b30 +b31z-1 +b32z-2)……… (b(N/2)0 +b(N/2)1z-1 +b(N/2)2 z-2)
  57. 57. DIGITAL FILTER DESIGN 57 Now each factored form in H(Z) is realized in Direct form and are Cascaded to obtain the realization of H(Z) as shown in Fig FIG 38: Cascade realization of Eq CASCADE REALIZATION OF FIR FILTER USING BIT PARALLEL ARITHMETIC:- Fig shows output result of Cascade realization of FIR filter using bit parallel arithmetic. Here 8 bit input vector ‘x’ is feed parallel with the rising edge of the clock pulse. Recall that the coefficients are coeff(0)=1,coeff(1)=2,coeff(2)=3,coeff(3)=4 ,coeff(4)=5,coeff(5) =6,coeff(6)=7,coeff(7)=8, coeff(8)=9. The sequence applied to the input were x[0]=4, x[1]=3,x[2]=2.For this realization three slice are chosen ,the first stage output are stored in ‘Y’ vector , the second stage output are stored in ‘Z’ vector and the final stage output are stored in ‘P’ vector .Therefore, with all the flip flops previously reset, at the first positive edge of the clock the expected output is y[0]=coef(0)* x[0]=4. At the next upward transition of the clock, the expected value of y[1]= coef(0)*x[1]+ coef(1)*x[0]=11 .And one clock cycle later Y[2]= coef(0)* x[2] +coef(1)* x[1] +coef(2)* x[0]=20 .
  58. 58. DIGITAL FILTER DESIGN 58 According to the Fig.6.2.4. if we consider the output result of second slice then at the first positive edge of the clock the expected value of output is Z[0]=coef(3)* Y[0]=16. In the next upward transition of the clock, the value of Z[1]= coef(3)*Y[1]+ coef(4)*Y[0]=64 .And one clock cycle later Z[2]= coef(3)* Y[2] +coef(4)* Y[1] +coef(5)* Y[0]=159 . FIG 39: FIR Filter Cascade realization using bit parallel arithmetic. Finally the output result, which is the output of third slice (slice 2), is P[0]=coef(6)* Z[0]=112. At the next upward transition of the clock, the value of P[1]= coef(6)*Z[1]+ coef(7)*Z[0]=576 and one clock cycle later P[2]= coef(6)* Z[2] +coef(7)* Z[1] +coef(8)* Z[0]=1769 . Where p[0] appear in the time slot of 300ns to 380ns.
  59. 59. DIGITAL FILTER DESIGN 59 CASCADE REALIZATION OF FIR FILTER USING BIT SERIAL ARITHMETIC:- Bit serial implementation of this filter is done with help of bit serial adder, serial parallel multiplier and delay element. In bit serial adder implementation registers are also used. The registers are used to store the carry output bit and feed this output as carry input in the next clock cycle. Digital filters are made of adder blocks, so there will be accumulation of delay of at least one clock pulse for every addition operation. This is one of the reason for bit serial Filter to have high initial latency. Figshows output result of direct form realization of FIR Filter using bit serial arithmetic. Here three single bit input from x0 to x2 are used which are feed with the rising edge of the clock pulse. As the data are feed serially so the LSB is applied first, which appear in the time slot of 60ns to 120ns.The sequence applied to the input were x(0)=4, x(1)=3,x(2)=2.Therefore, with all the flip flops previously reset, at the first positive edge of the clock the expected output value is y(0)=112. In the output LSB will appear first (that is 180ns to 240 ns) and the last bit (MSB) of ‘y0’ is situated in 480 to 540 ns. The output ‘y0’ will appear after a delay of three clock pulse from the first rising edge of the clock. To propagate the result via three different slice delay due to serial addition operation will get accumulated in each slice. That’s why the output will appear after a delay of three clock pulse form the first rising edge of the clock. The expected value of ‘y1’=576 will appear after a delay of six clock pulse from the first rising edge of the clock. This will go on increasing for ‘y2’ also, the LSB for ‘y2’ will appear after a delay of nine clock pulse from the first rising edge of the clock.
  60. 60. DIGITAL FILTER DESIGN 60 FIG 41: FIR Filter Cascade realization using bit serial arithmetic. SIMULATION TIME ANALYSIS OF CASCADE REALIZATION OF FIR FILTER USING BIT PARALLEL ARITHMETIC & BIT SERIAL ARITHMETIC:- In bit serial arithmetic the data are feed serially in each clock pulse. As it has been discussed earlier, in bit serial adder implementation registers are used. The registers are used to store the carry output bit of adder and feed this output as carry input of adder in the next clock cycle. Serial adder blocks are integral part of this filter design. There will be accumulation of delay of one clock pulse for every addition operations. This is one of the reasons for bit serial filter to have high initial latency. In this example of Cascade filter three slices are used. If we consider one addition take place in each slice so there will be an initial latency of three clock cycle for each output. Because the outputs are passes through each slice only ones. According to Fig.in bit parallel arithmetic Y[0] = 112 is situated in 300ns to 400 ns, after a initial latency of two clock pulse and according to bit serial arithmetic of Fig Y[0]=112 is situated after a initial latency of three clock pulse. In bit parallel arithmetic Y[1] = 576 is situated after a initial latency of three clock pulse and in bit serial arithmetic Y[1] = 576 is situated after a
  61. 61. DIGITAL FILTER DESIGN 61 initial latency of six clock pulses. In bit parallel arithmetic Y[2] = 1769 is situated after a initial latency of four clock pulse and in bit serial arithmetic Y[2] = 1769 is situated after a initial latency of nine clock pulses So from this comparison it is found that if the no of slice is further increased , then the initial latency in bit serial arithmetic will increase much higher than that of initial latency in bit parallel arithmetic. The delay in bit parallel arithmetic is only due to the presence of registers. To get the output data form a register we have to wait for at least one clock cycle. So we can say that, if simulation time is taken as a design parameter then Cascade realization of FIR filter using bit parallel arithmetic is advantageous compared to the bit serial arithmetic. AREA ANALYSIS OF CASCADE REALIZATION OF FIR FILTER USING BIT PARALLEL ARITHMETIC & BISERIAL ARITHMETIC:- According to design summery of Cascade realization of FIR Filter in Fig,which use bit parallel arithmetic the number of slice flip flop are 149, number of 4 input LUTs are 729, number of occupied slices are 395, number of bonded INPUT/OUTPUT are 74. Fig.6.2.7. shows the design summery of Cascade realization of FIR Filter which use bit serial arithmetic. According to this figure the number of slice flip flop are 80, the number of 4 input LUTs are 63, number of occupied slices are 42, number of bonded INPUT/OUTPUT are 9. If a comparison is made between these two design summery, then it is found that bit parallel arithmetic realization have used more number of slice flip flops , more number of 4 input LUTs, more number of occupied slices, more number of bonded INPUT/OUTPUT compared to bit serial realization. Extra number of slice flip flops are (149-80) =69, extra number of LUTs used are (729-63) =666 extra number of occupied slices are (395-42) =353, extra number of bonded INPUT/OUTPUT are (74-9)=65.
  62. 62. DIGITAL FILTER DESIGN 62 From this comparison it is found that bit parallel implementation of Cascade realization will need more chip area compared to bit serial implementation. The chip area is an important design parameter for any electronics circuits. If the design is considered with respect to the chip area, then bit serial implementation of this digital Filter is advantageous compared to the bit parallel implementation of that filter. Power consumption in the circuits is also related to the chip area. If the chip area is increased then Power consumption will also increased in the circuits. Number of Slices Flip Flops 149 Number of 4 input LUTs 729 Number of occupied slices 395 Number of bonded INPUT/OUTPUT 74 Chart 4: Design summary of Cascade realization of FIR Filter using bit parallel arithmetic. Number of Slices Flip Flops 80 Number of 4 input LUTs 63 Number of occupied slices 42 Number of bonded INPUT/OUTPUT 9 Chart 5: Design summary of Cascade realization of FIR Filter using bit serial arithmetic.
  63. 63. DIGITAL FILTER DESIGN 63 POWER ANALYSIS OF CASCADE REALIZATION OF FIR FILTER USING BIT PARALLEL ARITHMETIC & BIT SERIAL ARITHMETIC:- The study on total estimated power consumption for Cascade realization of FIR filter reveals that, bit parallel arithmetic representation of FIR filter consume more power compare to bit serial arithmetic representation. Fig shows the data of Xpower analysis of lattice realization of FIR filters by using Xilinx tool. Which tell that Cascade realization of FIR filter using bit serial arithmetic will consume 0.057 watt power while the same filter produced by using bit parallel arithmetic will consume 0.068 watt power in the internal circuitry. Total estimated power consumption in Watt Cascade realization of FIR filter using bit parallel arithmetic Cascade realization of FIR filter using bit serial arithmetic 0.091 w 0.083w Chart 6: Power summary of Cascade realization of FIR filters. According to the wave form of Fig.6.2.9.total power consumption is the sum of quiescent power, logic power, IO power & digital clock manager power. Details description about each and individual power consumption is given previously in this chapter.
  64. 64. DIGITAL FILTER DESIGN 64 FIG 42: Output wave form of power analysis of FIR filter (Cascade realization) using bit parallel arithmetic. Comparative study between the first wave form of Fig and Fig reveals that quiescent power, logic power and digital clock manager (DCM) power are almost same in both the cases. But IO power consumption is high for bit parallel cases. In bit parallel arithmetic more number of input output ports are used compared to bit serial arithmetic. Which is the reason for bit parallel arithmetic representation of Cascade realized filter to consume more IO power compared to bit serial arithmetic representation. As a result of that the over all power consumption for bit parallel arithmetic representation of Cascade FIR filter is higher with respect to bit serially represented lattice filter.
  65. 65. DIGITAL FILTER DESIGN 65 FIG 43: Output wave form of power analysis of FIR filter (Cascade realization) using bit serial arithmetic. The third and fourth waveform of Fig and Fig provides the information about changes in power with the change in junction temperature for bit parallel and bit serial arithmetic. Junction temperature plays an important role in measuring the device static power .Small change in junction temperature will drastically change the device power consumption. Here operational junction temperature is chosen as 27.1 degree Celsius. So the study reveals that if power is considered as one of the design criteria, then it is better to design Cascade realization of FIR filters by using bit serial arithmetic compared to bit parallel arithmetic representation.
  66. 66. DIGITAL FILTER DESIGN 66 LATTICE STRUCTURE OF AN FIR FILTER:- Let us consider a FIR Filter with system function H(Z) = Am(Z) =1+ ∑α m(k)z-k m>=1 for k=1 to m From which we have Y(Z) =X(Z)[1 +∑α m(k)z-k ] for k=1 to m Taking inverse Z-transform on both side we get y(n)=x(n)+ ∑α m(k)x(n-k) for k=1 to m……………………Eq Eq.6.3.1. represent a FIR system with system function H(Z)= Am(Z). Lattice structure for an all zero FIR system is obtained by interchanging the role of input and output. For an all pole Filter the input x(n) =ƒN(n) and the output y(n) =ƒ0(n) For all zero FIR system of order M-1 the input x(n) =ƒ0(n) and the output y(n) =ƒM-1(n) For m =1 the Eq.6.3.1. reduces to y(n)=x(n)+α1(1) x(n-1)………………………………………………Eq The output can be obtained from single stage lattice Filter shown in Fig from which we have x(n) = f0(n) = g0(n) y(n) = ƒ1(n) = ƒ0(n) + k1g0(n-1) = x(n) +k1x(n-1) and g1(n) = k1 ƒ0(n) + g0(n-1) = k1x(n) + x(n-1)………………………………………Eq Comparing Eq. with Eq we get α1(0)=1 and α1(1)=k1.
  67. 67. DIGITAL FILTER DESIGN 67 FIG 44: single stage all zero Lattice Filter. Now let us consider an FIR Filter for which m=2. then y(n)=x(n)+α2(1) x(n-1) + α2(2) x(n-2)……………….. By cascading two lattice stage as shown in Fig it is possible to obtain the output y(n). FIG 45: Two stage all zero Lattice Filter. From Fig the output for second stage is y(n) = ƒ2(n) = ƒ1(n) + k2g1(n-1) ……………………………………………Eq = g2(n) = k2 ƒ1(n) + g1(n-1)
  68. 68. DIGITAL FILTER DESIGN 68 Substitute for ƒ1(n) and g1(n-1) from Eq. in Eq. we get y(n) = ƒ2(n) = x(n) + k1x(n-1) + k2 [ k1x(n-1) + x(n-2)] = x(n) +k1(1 + k2)x(n-1) + k2x(n-2)…………………………….Eq Eq is identical to Eq from which we have α2(0)= 1, α2(2)= k2 , α2(1) =k1(1+k2) = α1(1)[1 + α2(2)]. Similarly g2(n) = α2x(n) +k1(1+k2)x(n-1) +x(n-2). LATTICE STRUCTURE OF FIR FILTER USING BIT PARALLEL ARITHMETIC:- Fig shows the output result of single stage Lattice realization of FIR Filter using bit parallel arithmetic. Here 7 bit input vector x is feed parallel with the rising edge of the clock pulse. The only one coefficient chosen in this stage is k1=3. The block diagram of single stage lattice realization of FIR Filter is shown in Fig The sequence applied to the input were x[0]=1, x[1]=2,x[2]=3,X[3]=4.when all the flip flop previously reset, then at the first positive edge of the clock the expected output is y[0]=x[0]+k1* x[0-1]=1. At the next upward transition of the clock, the expected value of y[1]= x[1]+k1* x[1-1]=5 .And one clock cycle later Y[2]= x[2]+k1* x[2- 1]=9 and at last Y[3]= x[3]+k1* x[3-1]=13. As each stage have two sets of output so there will be another set of output in terms of ‘g’. where at the first positive edge of the clock pulse the value is g[0]=k1*x[0]+ x[0- 1]=3. At the next upward transition of the clock pulse the value is g[1]=k1*x[1]+ x[1-1]=7.And one clock cycle later g[2]= k1*x[2]+ x[2-1]=11 and at last g[3]= k1*x[3]+x[3-1]=15.
  69. 69. DIGITAL FILTER DESIGN 69 FIG 46: FIR Filter Lattice realization using bit parallel arithmetic. LATTICE STRUCTURE OF FIR FILTER USING BIT SERIAL ARITHMETIC:- Bit serial implementation of this Filter is done with help of bit serial adder, serial parallel multiplier and delay element. Fig.6.3.4 shows output result of Lattice realization of FIR Filter using bit serial arithmetic. Here three single bit input from x0 to x3 are used, which are feed with the rising edge of the clock pulse. As the data are feed serially so the LSB is applied first, which appear in the time slot of 50ns to 150ns for all four input data bit. The sequence applied to the input was x0=1, x1=2,x2=3 and X3=4.when all the flip flops are previously reset, then at the first positive edge of the clock the expected output are y0=1 and g0=3. In both the output ‘y’ and ‘g’, LSB will appear first. Which appear after a delay of one clock pulse from the first rising edge of the clock,(that is 100ns to 200 ns). The expected value of ‘y1’ and ‘g1’ will appear after a delay of two clock pulse from the first rising edge of the clock. This will go on in the same fashion for y2 and g2 also. At last the LSB for y3 and g3 will appear after a delay of four clock pulse from the first rising edge of the clock.
  70. 70. DIGITAL FILTER DESIGN 70 FIG 47: FIR Filter Lattice realization using bit serial arithmetic. SIMULATION TIME ANALYSIS OF LATTICE REALIZATION OF FIR FILTER USING BIT PARALLEL ARITHMETIC & BIT SERIAL ARITHMETIC:- As it has been discussed earlier, in bit serial adder implementation registers are used. The registers are used to store the carry output bit of adder and feed this output as carry input of adder in the next clock cycle. So there will be a delay of one clock cycle. As the Filter is made of this adder block, so there will be accumulation of delay of one clock pulse for every add operations in the Filter. This is one of the reason for bit serial Filter to have high initial latency. In this example of Lattice Filter one stage is used. In this Filter realization each output depend on the present and previous value of input, where previous value of input means input at earlier clock. So there will be propagation of delay via addition operation in each output. In the bit serial and parallel realization of this Filter each output has one additional initial latency than the previous output. For example, output ‘y1’ have one extra initial latency than output ‘y0’. In Fig.That means LSB of ‘y1’ will appear one clock cycle later than the LSB of ‘y0’.
  71. 71. DIGITAL FILTER DESIGN 71 But for bit serial case LSB and MSB of output and input are not appear in the same clock cycle. As the output LSB appear late so it will take time for the MSB to appear in the output .Which is not the case of bit parallel implementation. In bit parallel implementation LSB and MSB of any output are appear in the same clock cycle. So it will take less simulation time to get the output. From this study it is found that the simulation time taken to make Lattice realization of FIR Filter using bit serial arithmetic is much higher than bit parallel implementation. So from simulation time analysis point of view use of bit parallel arithmetic for designing Lattice Filter is advantageous compared to the bit serial arithmetic. AREA ANALYSIS OF LATTICE REALIZATION OF FIR FILTER USING BIT PARALLEL ARITHMETIC & BIT SERIAL ARITHMETIC:- According to design summery of Lattice realization of FIR Filter in Fig.6.3.5,which use bit parallel arithmetic the number of slice flip flop are 5, number of 4 input LUTs are 123, number of occupied slices are 65, number of bonded INPUT/OUTPUT are 42. Fig.6.3.6. shows the design summery of Lattice realization of Fir Filter which use bit serial arithmetic. According to this figure the number of slice flip flop are 24, the number of 4 input LUTs are 24, number of occupied slices are 14, number of bonded INPUT/OUTPUT are 15. If a comparison is made between these two design summery, then it is found that bit parallel arithmetic realization have used more number of 4 input LUTs, more number of occupied slices, more number of bonded INPUT/OUTPUT compared to bit serial realization. Extra number of LUTs used are (123-24) =99, extra number of occupied slices are (65-14) =51, extra number of bonded INPUT/OUTPUT are (42-15) =27.
  72. 72. DIGITAL FILTER DESIGN 72 Number of Slices Flip Flops 5 Number of 4 input LUTs 123 Number of occupied slices 65 Number of bonded INPUT/OUTPUT 42 Chart 7: Design summary of Lattice realization of FIR Filter using bit parallel arithmetic. But bit serial arithmetic realization have used more number of slice flip flop (24-5 =19) compared to bit parallel realization, which is different from the previous two cases. But from over all analysis it is found that bit parallel implementation of Lattice realization will need more chip area compared to bit serial implementation. The chip area is an important design parameter for any electronics circuits. If the design is considered in terms of chip area, then bit serial implementation of this digital Filter is advantageous compared to the bit parallel implementation. Number of Slices Flip Flops 24 Number of 4 input LUTs 24 Number of occupied slices 14 Number of bonded INPUT/OUTPUT 15 Chart 8: Design summary of Lattice realization of FIR Filter using bit serial arithmetic.
  73. 73. DIGITAL FILTER DESIGN 73 POWER ANALYSIS OF LATTICE REALIZATION OF FIR FILTER USING BIT PARALLEL ARITHMETIC & BIT SERIAL ARITHMETIC:- Fig shows the data of xpower analysis of lattice realization of FIR filters by using Xilinx tool. Which tell that Lattice realization of FIR filter using bit serial arithmetic will consume 0.057 watt power while the same filter produced by using bit parallel arithmetic will consume 0.068 watt power in the internal circuitry. Total estimated power consumption in Watt Lattice realization of FIR filter using bit parallel arithmetic Lattice realization of FIR filter using bit serial arithmetic 0.068 w 0.057w Chart 9: Power summary of lattice realization of FIR filters. So the study on total estimated power consumption for Lattice realization of FIR filter reveals that, bit parallel arithmetic representation of FIR filter consume more power compare to bit serial arithmetic representation. According to the wave form of Fig total power consumption is the sum of quiescent power, logic power, IO power & digital clock manager power. Details description about each and individual power consumption is given previously in this chapter.
  74. 74. DIGITAL FILTER DESIGN 74 FIG 48:Output wave form of power analysis of FIR filter (Lattice realization) using bit parallel arithmetic. Comparative study between the first wave form of Fig and Fig. tells that quiescent power, logic power and digital clock manager (DCM) power are almost same in both the cases. But IO power consumption is high for bit parallel cases. In bit parallel arithmetic more number of input output ports are used compared to bit serial arithmetic. Which is the reason for bit parallel arithmetic representation of Lattice realized filter to consume more IO power compared to bit serial arithmetic representation. As a result of that the over all power consumption for bit parallel arithmetic representation of Lattice FIR filter is higher with respect to bit serially represented lattice filter.
  75. 75. DIGITAL FILTER DESIGN 75 FIG 49: Output wave form of power analysis of FIR filter (Lattice realization) using bit serial arithmetic. Junction temperature plays an important role in measuring the device static power .Small change in junction temperature will drastically change the device power consumption. The third and fourth waveform of Fig. and Fig. provides the information about changes in power with the change in junction temperature for bit parallel and bit serial arithmetic. Here operational junction temperature is chosen as 26.3 degree celcious. So the study reveals that if power is considered as one of the design criteria, then it is better to design lattice realization of FIR filters by using bit serial arithmetic compared to bit parallel arithmetic representation.
  76. 76. DIGITAL FILTER DESIGN 76 CONCLUSION This current work is dealing with an approach to design and implementation of very fast fixed- function digital filters using bit-serial and bit-parallel arithmetic. The main concerns of the filter designs are high throughput, small chip area and low power consumption. The increased throughput can be traded for reduced power consumption through power supply voltage scaling. VHDL and FPGA provided the platform for realization of Direct form, Cascade form and Lattice structure of digital Filters using bit serial bit parallel arithmetic. By making a comparative study among all this Filters to estimate the performance in terms of simulation time, chip area and power consumption, several important performances are observed. From these performances it is found that (i) Simulation time - bit parallel implemented digital filters take less time compared to bit serial implementation. (ii) Initial latency - bit serially designed filters have higher initial latency compared to bit parallel implemented filters. (iii) Chip area - bit parallel implementation of digital filters consume much larger area compared to the same filters realized using bit serial arithmetic. (iv) Power consumption - bit serial digital filters have less power consumption than bit parallel implementation. VHDL has been used successfully for designing the filters by loading the VHDL software (Xilinx) of version 7.1i in pc. For implementing the design, Spartan-3E kit has been chosen, which is connected via USB port of the pc. But with such direct connection there are some incompatibilities arising while input bits exceed more than 16 bits. As the Spartan-3E kit is having 16 input ports and 8 output ports, our implementation is thus restricted by only checking the peripheral filter circuitry (such as adder, multiplier, subtractor etc.). In all of our designed filters, there are 32 input bits and 32 output bits. So we have to develop proper interfacing which will be able to handle more numbers of input and output bits.
  77. 77. DIGITAL FILTER DESIGN 77 FUTURE PLANS In this present work our study is restricted to three different FIR filter realizations. The other realizations of FIR Filter like direct form two, linear phase realization could be achieved by the same arithmetic. The other filter realization arithmetic like distributed arithmetic, digit serial arithmetic etc. can be incorporated as our future plan. It has been observed that the project have certain limitations regarding measurement of the area consumed by the designed filter due to unavailability of the proper simulation tools. According to Chapter 6, the total chip area is measured by counting the number of look up table, flip flops etc. So, one of our future goals will be to develop a simulation tool which can measure the exact chip area in terms of milimeter2 or micrometer2. In the same chapter, we have measured the power consumed indirectly by some software tools. No tools are available as a free simulation tool to measure power directly from the designed filters, and also analysis the power performance of the filter. At the same time in future, we will include our design expertise to explore the domain of IIR filters.
  78. 78. DIGITAL FILTER DESIGN 78 VHDL CODES FOR FIR FILTERS  USING BIT-PARALLEL ARITHMETIC  VHDL Code for 4-BIT COUNTER entity counter_4_bit is Port ( clk : in STD_LOGIC; rst : in STD_LOGIC; q : inout STD_LOGIC_VECTOR (4 downto 1); qbar : inout STD_LOGIC_VECTOR (4 downto 1)); end counter_4_bit; architecture Behavioral of counter_4_bit is component d_flip_flop is Port ( d : in STD_LOGIC; clk : in STD_LOGIC; rst : in STD_LOGIC; q : out STD_LOGIC); end component; signal i,j,k,l : STD_LOGIC;
  79. 79. DIGITAL FILTER DESIGN 79 begin qbar <= not q; i<=qbar(1); j<=q(1) xor q(2); k<=(q(1) and q(2) and qbar(3)) or (q(3) and (qbar(1) or qbar(2))) ; l<=(q(4) and (qbar(1) or qbar(2) or qbar(3))) or (q(1) and q(2) and q(3) and qbar(4)); d1: d_flip_flop port map(i,clk,rst,q(1)); d2: d_flip_flop port map(j,clk,rst,q(2)); d3: d_flip_flop port map(k,clk,rst,q(3)); d4: d_flip_flop port map(l,clk,rst,q(4)); end Behavioral;  VHDL CODE FOR BOOTHMULTIPLIER library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_SIGNED.ALL; use ieee.numeric_std.all; entity encoder is Port ( a : in std_logic_vector(7 downto 0); arg : in std_logic_vector(2 downto 0); pprod : out std_logic_vector(15 downto 0)); end encoder;
  80. 80. DIGITAL FILTER DESIGN 80 architecture Behavioral of encoder is function encoder(arg1: std_logic_vector(2 downto 0);data:std_logic_vector(7 downto 0)) return std_logic_vector is variable temp,temp1,temp2: std_logic_vector(8 downto 0); variable sign: std_logic; begin case arg1 is when "001"|"010" => if data <0 then temp:='1'& data else temp:='0'&data; end if; when "011" => if data<0 then temp1:='1'&data; temp:=temp1(7 downto 0)&'0'; else temp:='0'&data(6 downto 0)&'0'; end if ; when "100" => if data<0 then temp1:='1'&data; temp2:=(not temp1)+"000000001"; temp:=(temp2(7 downto 0)&'0'); else temp1:='0'&data; temp2:=(not temp1)+"000000001"; temp:=(temp2(7 downto 0)&'0'); end if;
  81. 81. DIGITAL FILTER DESIGN 81 when "101"|"110" => if data < 0 then temp1:='1'&data; temp:=not(temp1)+"000000001"; else temp1:='0'&data; temp:=(not temp1)+"000000001"; end if; when others => temp:="000000000"; end case; return temp; end encoder; signal s1: std_logic_vector(8 downto 0); signal s2: std_logic; begin s1<=encoder(arg,a); pprod<=sxt(s1,16); end Behavioral;  VHDL CODE FOR SIXTEEN BIT FULL ADDER library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_SIGNED.ALL; entity sixteenbit_fa is Port ( a : in STD_LOGIC_VECTOR (15 downto 0); b : in STD_LOGIC_VECTOR (15 downto 0);

×