Your SlideShare is downloading. ×

427 432


Published on

Published in: Technology

  • Be the first to comment

  • Be the first to like this

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 4, June 2012 VLSI Implementation of Pipelined Fast Fourier Transform K. Indirapriyadarsini1, S.Kamalakumari2, G. Prasannakumar3 Swarnandhra Engineering College 1&2, Vishnu Institute of Technology, 3 {darsiniprasanna13061, Kamalakumari162, godiprasanna3}@gmail.comAbstract: architecture style. However, this kind of architectureDigital Signal Processing (DSP) has become a very style has long latency, low throughput, and cannot beimportant and dynamic research area. Now a day’s parallelized. On the other hand, the pipelinemany integrated circuits dedicated to DSP functions. architecture style can get rid of the disadvantages ofUnfortunately Existing designs are restricted to a low the foregoing style, at the cost of anacceptableaccuracy and a small sample number. The Fourier hardware overhead. Generally, the pipeline FFTtransform is widely used in industrial applications as processors have two popular design types. One useswell as in scientific research. The most common use single-path delay feedback (SDF) pipelineis to transform a function of time into a frequency architecture and the other uses multiple-path delayfunction. In this paper, we present the efficient commentator (MDC) pipeline architecture. Theimplementation of a pipeline FFT. Our design adopts single-path delay feedback (SDF) pipeline FFT [6]-a single-path delay feedback style as the proposed [7] is good in its requiring less memory space (abouthardware architecture. To eliminate the read-only N-1 delay elements) and itsmemories (ROM’s) used to store the twiddle factors, multiplicationcomputation utilization being less thanthe proposed architecture applies a reconfigurable 50%, as well as its control unit being easy to design.complex multiplier and bit-parallel multipliers to Such implementations are advantageous to low-achieve a ROM-less FFT processor, thus consuming power design, especially for applications in portablelower power than the existing works. DSP devices. Based on these reasons, the SDFIndex Terms: FFT, ROM, complex multiplier. pipeline FFT is adopted in our work. Our proposed architecture includes a reconfigurable complexI. INTRODUCTION constant multiplier and bit-parallel complex multipliers instead of using ROM’s to store twiddle Discrete Fourier transform (DFT) is a very factors, which is suited for the power-of-2 radix styleimportant technique in modern digital signal of FFT/IFFT processors. In essence, a short versionprocessing (DSP) and telecommunications, especially of the present work has been published in [10]. In thisfor applications in orthogonal frequency paper, a more detailed and completed description ofdemodulation multiplexing (OFDM) systems, such as the entire work is provided.The rest of this paper isIEEE 802.11a/g [1], Worldwide Interoperability for organized as follows. First, a brief review of the fastMicrowave Access (WiMAX) [2], Long Term Fourier transform is described in Section II. SectionEvolution(LTE) [3], and Digital Video III presents our proposed FFT architecture forBroadcasting—Terrestrial(DVB-T) [4]. However, application in wireless communication systems. TheDFT is computational intensive and has a time performance evaluation of various FFT architecturescomplexity of O(N2). The fast Fourier transform is then discussed in Section IV. Finally, concluding(FFT) was proposed by Cooley and Tukey [5] to remarks are given in Section V.efficiently reduce the time complexity to O(Nlog2N), where N denotesthe FFT size. II. FFT AND IFFT ALGORITHMSFor hardware implementation, various FFT The discrete Fourier transforms (DFT) Xkof an N-processors have been proposed [6]. These point discrete-time signal xnis definedimplementations can be mainlyclassified into by:memory-based and pipeline architecture styles. 𝑋 𝑘 = 𝑛=0 𝑥 𝑛 𝑊 𝑁𝑘𝑛 𝑁−1 0≤k≤N-1, (1)Memory-based architecture is widely adopted to 𝑗 2𝜋𝑘𝑛design anFFT processor, also known as the single Where the twiddle factor 𝑊 𝑁𝑘𝑛 = 𝑒 − 𝑁processing element (PE) approach. This deign style is denotes N-point primitive root of unity. However, ausually composed of amain PE and several memory straightforward implementation of this algorithm isunits, thus the hardware cost and the power obviously impractical due to the huge hardwareconsumption are both lower than the other 427 All Rights Reserved © 2012 IJARCET
  • 2. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 4, June 2012required. Therefore, the fast Fourier transform (FFT) To reuse the same hardware core for reducing the[5] was developed to efficiently speed up its chip area [16], (4) can be rewrite as: 1Computation time and significantly reduce the 𝑥 𝑛 = ( 𝑘=0 𝑋 ∗ 𝑊 −𝑘𝑛 )∗ 𝑁−1 𝑘 𝑁 0≤n≤N-1 (5) 𝑁hardware cost. Generally, FFT analyzes an inputsignal sequence by using decimation-in-frequency Where the star symbol * denotes a conjugate. This(DIF) or decimation-in-time (DIT) decomposition to new form can be viewed as a general DFT. In otherconstruct an efficiently computational signal-flow words, DFT and IDFT can reuse the same hardwaregraph (SFG). Here, our work employs a core, while IDFT requires some extra computations.DIFdecomposition because it matches the These extra computations include conjugating themanipulation manner of single-path delay pipeline input data Xkand the outcomes of DFT, as well asfacility. An example of radix-2 DIF FFT SFG for N = dividing the previous output by N. Obviously, this16 is depicted in Fig. 1. new reuse version of DFT/IDFT algorithm will also simplify the design effort of an DFT/ IDFT processor and thus reduce the chip area, if both the DFT and IDFT processors are activated alternatively, and not simultaneously. III. PROPOSED ARCHITECTURE Traditional hardware implementation of FFT/IFFT processors usually employs a ROM to look up the wanted twiddle factors, and then word length complex multipliers to perform FFT computing. However, this introduces more hardware cost, thus a bit-parallel complex constant multiplication scheme [8] is used to improve the foregoing issue. Besides, since the twiddle factors have a symmetric property, the complex multiplications used in FFT computation can be one of the following three operation types Fig. 1 Radix-2 DIF FFT signal-flow graph of 𝑘− 𝑁 length 8 𝑊 𝑁𝑘 . 𝑎 + 𝑗𝑏 = 𝑊 𝑁 4 𝑏 − 𝑗𝑎 , N/4<k<N/2, (6) 𝑁The radix-2 DIF FFT described above appears 𝑘−regularity in SFG and has less complex multipliers 𝑊 𝑁𝑘 . 𝑎 + 𝑗𝑏 = −𝑊 𝑁 2 𝑏 − 𝑗𝑎 , N/2<k<3N/4, (7)required. Thus, it is suited for hardwareimplementation, because some complex 3𝑁multiplications can be simplified to reduce the chip 𝑘− 𝑊 𝑁𝑘 . 𝑎 + 𝑗𝑏 = −𝑊 𝑁 4 𝑏 − 𝑗𝑎 3N/4<k<N (8)area. For instance, an input signal multiplied byW82in Fig. 1 can beexpressed as:. Given the above three equations, any twiddle factor 2 can be obtained by a combination of these twiddle- 𝑎 + 𝑗𝑏 𝑊16 = 2 𝑎 + 𝑏 + 𝑗 𝑏 − 𝑎 /2, (2) factor primary elements. In other words, arbitrary twiddle factor used in FFT can utilize these operationWhere (a+jb) denotes a discrete-time signal in types to derive the wanted value, thus cancomplex form similarly, the complex multiplication significantly shorten the size of ROM used to storeof W162is given by: 2 the twiddle factors. Moreover, for hardware 𝑎 + 𝑗𝑏 𝑊16 = 2 𝑏 − 𝑎 − 𝑗 𝑏 + 𝑎 /2, (3) implementation consideration, we add two extraBoth these above equations will ease hardware operation types to further decrease the size of ROM.implementation in the future, because they only need Our method can also prune away the critical path into calculate the multiplication by √2 / 2 and two real the designed hardware such that the system clockadditions, respectively. Especially, the multiplication becomes faster. The two additional operation typesby √2 / 2 can be obtained easily, which circuit design are given by:will be introduced in the latter section. The inverse (𝑁/4)−𝑘 𝑊 𝑁𝑘 . 𝑎 + 𝑗𝑏 = [𝑊 𝑁 𝑏 + 𝑗𝑎 ]*, 1≤k<N/4 (9)discrete Fourier transform (IDFT) of length N isgiven by: 𝑁 1 𝑁−1 −𝑘𝑛 −𝑘 𝑥𝑛 = 𝑁 𝑘=0 𝑋 𝑘 𝑊 𝑁 , 0≤n≤N-1 (4) 𝑊 𝑁𝑘 . 𝑎 + 𝑗𝑏 = −𝑗[𝑊 𝑁 2 𝑏 + 𝑗𝑎 ] ∗, N/4≤k<N/2, (10) 428 All Rights Reserved © 2012 IJARCET
  • 3. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 4, June 2012 C. Bit-Parallel Multipliers In Section II, the multiplication by 1/ √2 can employA. Proposed Architecture a bit parallel multiplier to replace the wordlengthIn order to improve the previous works on power multiplier and square root evaluation for chip areareduction, we propose a radix-2 pipeline FFT/IFFT reduction. The bit-parallel operation in terms ofprocessor with low power consumption. The power of 2 is given by:proposed architecture is composed of three differenttypes of processing elements (PEs), a complex Output =inx√2/2=inx(2-1+2-3+2-4+2-6+2-8+2-14), (11)constant multiplier, delay-line (DL) buffers (asshown by a rectangle with a number inside), and If a straightforward implementation for the abovesome extra processing units for computing IFFT. equation is adopted, it will introduce a poor precisionHere, the conjugate for extra processing units is easy due to the truncation error and will spend moreto implement, which only takes the 2’s complement hardware cost. Therefore, to improve the precisionof the imaginary part of a complex value. In addition, and hardware cost, Eq.(11) can be rewritten as:for a complex constant multiplier in Fig. 2, wepropose a novel reconfigurable complex constant Output=in x √2/2=in x [(1+(1+2-2)(2-6-2-2)], (12)multiplier to eliminate the twiddle-factor ROM. Thisnew multiplication structure thus becomes the keycomponent in reducing the chip area and powerconsumption of our proposed FFT processor. Thedetailed functions of these modules appeared in Fig. DL-Iout2 are described in the following subsections. + 1 _ DL-IinB. Processing Elements 0Based on the radix-2 FFT algorithm, the three typesof processing elements (PE3, PE2, and PE1) used in S0our design are illustrated in Fig. 2, Fig. 4, and Fig. 3,respectively. The functions of these three PE types 1correspond to each of the butterfly stages as shown in 0Fig. 1. First, the PE3 stage is used to implement a Ioutsimple radix-2 butterfly structure only, and serves as I in + 1the sub modules of the PE2 and PE1 stages. In the 1figure, Iin and Iout are the real parts of the input andoutput data, respectively. Qin and Qout denotes theimage parts of the input and output data, respectively. 1Similarly, DL-Iin and DL-Iout stand for the real parts of Qin + 1input and output of the DL buffers, and DL-Qin and QoutDL-Qout are for the image parts, respectivelyAs for 0the PE2 stage, it is required to compute the 1multiplication by –j or 1. Note that the multiplication S0by -1 in Fig. 3 is practically to take the 2’scomplement of its input value. 1 _ 0In the PE1 stage, the calculation is more complex DL-Qinthan the PE2 stage, which is responsible for 1computing the multiplications by –j, WNN/8, and DL-Qout + 1WN3N/8 respectively. Since WN3N/8 =-j WNN/8 it can begiven by either the multiplication by WNN/8 first and Fig. 2 Circuit diagram of our proposed PE3 stage.then the multiplication by –j or the reverse of the 1previous calculation. Hence, the designed hardwareutilizes this kind of cascaded calculation andmultiplexers to realize all the necessary calculationsof the PE1 stage. This manner can also save a bit- 1parallel multiplier for computing, which furtherforms a low-cost hardware. 429 All Rights Reserved © 2012 IJARCET
  • 4. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 4, June 2012 DL-Iout 0 0 DL-Iin Iin 1 1 Iout I N/8 X WN S1 -1 S2 PE3 Qout 1 1 Qin Q DL-Qout 1 + 1 DL-Qin 1 0 1 0 1 Fig.3 Circuit diagram of our proposed PE1 stage 1 >>2 >>4 1 1 0 DL-Iin DL-Iout 1 In >>2 OUT Iin Iout - -1 S1 Fig. 5 Circuit diagram of the bit-parallel PE3 1 multiplication by 1/√2 Qin Qout + 1 Besides, we need not to use bit-parallel multipliers to DL-Qin DL-Qout replace the word length one for two reasons. One is 1 0 on the operation rate. If bit-parallel multipliers are used, the clock rate is decreased due to the many cascades adders. The other reason is the introduction Fig. 4 Circuit diagram of our proposed PE2 1 of high wiring complexity because many bit-parallel stage. multipliers are required to be switched for performing multiplication operations with different twiddleAccording to , the circuit diagram of the bit-parallel 1 factors. In fact, this phenomenon also appears in [8].multiplier is illustrated in Fig. 5. The resulting circuit Based on the above two reasons, the word ofuses three additions and three barrel shift operations. operation speed and chip area. Note that our proposedThe realization of complex multiplication by complex constant multiplier will not length multiplierWNN/8using a radix-2 butterfly structure with its both is still adopted to implement our complex constantoutputs commonly multiplied by 1/√2 is shown in multiplier under the consideration. Introduce theFig. 6. This circuit has just been used in the PE1 issue of high hardware cost as described earlier,stage. because no ROM is used IV. PERFORMANCE EVALUATION AND RESULT. The performance evaluation can be obtained by formulation of normalization power per FFT is defined as follows: 430 All Rights Reserved © 2012 IJARCET
  • 5. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 4, June 2012 V. CONCLUSIONNormalized power per FFT = A novel ROM-less and low-power pipeline FFT/IFFT for OFDM applications have been 𝑝𝑜𝑤𝑒𝑟 𝑋 1000 (13) described in this paper. Considering the symmetric(𝑣𝑜𝑙𝑡𝑎𝑔𝑒 )2 (𝐹𝐹𝑇 𝑠𝑖𝑧𝑒𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 ) property of twiddle factors in FFT, we have designed a reconfigurable complex constant multiplier suchThe functional simulation of the proposed that the size of twiddle factor ROM is significantlyarchitecture has been justified by using Verilog HDL. shrunk, especially no ROM is needed in our work.The result evidences the validation of the proposed This result shows that our design owns lowerarchitecture. To further validate our proposed hardware cost and power consumption compared toarchitecture, we implement this architecture on a the existing ones. Of course, our proposed schemecommercial FPGA chip. The result shows that the can also be adapted to high-point FFT applications,proposed architecture works very well. with a lower size of twiddle-factor ROM’s. our design is relatively low cost and consumes lower power, it can serve as a powerful FFT/IFFT processor 1/√2 in many other wireless communication systems. REFERENCES I in I out [1] IEEE Std 802.11a, 1999, ―Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) specifications: High-Speed Physical Layer in the 5 _ GHz band.‖ Q in Q out [2] IEEE 802.16, IEEE Standard for Air Interface for Fixed Broadband Wireless Access Systems, the Institute of Electrical and Electronics Engineers, Inc., 1/√2 June 2004. Fig. 6 Circuit diagram of the multiplication by [3] Chu Yu, Yi-Ting Liao, Mao-Hsu Yen, Pao-Ann WNN/8 Hsiung, and Sao-Jie Chen, ―A Novel Low-Power 64- point Pipelined FFT/IFFT Processor for OFDM Applications,‖ in Proc. IEEE Int’l Conference on Consumer Electronics. Jan. 2011, pp. 452-453. [4] ETSI, ―Digital Video Broadcasting (DVB); Framing Structure, Channel Coding and Modulation for Digital Terrestrial Television,‖ ETSI EN 300744 v1.4.1, 2001. [5] J. W. Cooley and J. W. Tukey, ―An algorithm for the machine calculation of complex Fourier series‖ Math. Comput, vol. 19, pp. 297- 301, Apr. 1965. [6] S.He and M. Torkelson, ―Designing Pipeline FFT Processor for OFDM (de)Modulation,‖ in Proc. URSI Int. Symp. Signals, Systems, and Electronics, vol. 29, Oct.1998, pp. 257-262. [7] H.L. Groginsky and G.A. Works, ―A pipeline fast Fourier transform,‖ IEEE Transactions on Fig: 7 FFT using proposed architecture Computers, vol. C-19, no. 11, pp. 1015-1019, Nov. 1970. 431 All Rights Reserved © 2012 IJARCET
  • 6. ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology Volume 1, Issue 4, June 2012[8] KoushikMaharatna, Eckhard Grass, and UlrichJagdhold, ―A 64-Point Fourier transform chip forhigh-speed wireless LAN application using OFDM,‖IEEE Journal of Solid-State Circuits, vol. 39, no. 3,pp. 484- 493, Mar. 2004.[9] Y.T. Lin, P.Y. Tsai and T.D. Chiueh, ―Low-power variable-length fast Fourier transformprocessor,‖ IEE Proc. Comput. Digit. Tech., vol. 152,no. 4, pp. 499-506, July 2005. K.Indirapriyadarsini: studying M.Tech in Swarnandhra College of engineering and technology, Narsapuram, and Major working areas are VLSI and embedded systems Presented research paper in one national conference. S.kamalakumari: Associate. Professor in swarnandhra college of engineering and technology, Narsapuram, Major working areas are wireless communications, Linear and Digital ICs and VLSI. Has seven years of teaching experience presented research papers in two national conferences. G.Prasanna Kumar: Asst. Professor in Vishnu institute of technology. Has four years of teaching experience. Major working areas are Digital Signal Processing, Wireless communications and Embedded Systems Presented researchpaper in one national conference. 432 All Rights Reserved © 2012 IJARCET