SlideShare a Scribd company logo
1 of 10
Download to read offline
International Journal of Science and Research (IJSR)
ISSN (Online): 2319-7064
Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438
Volume 4 Issue 1, January 2015
www.ijsr.net
Licensed Under Creative Commons Attribution CC BY
Design of High Speed VLSI Architecturefor 1-D
Discrete Wavelet Transform
Rashmi Patil1
, Dr. M. T. Kolte2
1
Research Student, Department of Electronics, B. D. C. O. E., Wardha, Maharashtra, India
2
Professor and H .O. D. of ENTC Department, India M. I. T. C. O. E., Pune, Maharashtra, India
Abstract: The work presents an implementation of discrete wavelet transform (DWT) using systolic array architecture in VLSI. The
architecture consists of filter unit, storage unit and control unit. This performs calculations of low pass and high pass coefficients by
using only one multiplier. This architecture has been implemented and simulated using VLSI. The hardware utilization efficiency is
more as compared to the referred due to the FBRA scheme. The systolic nature of this architecture corresponds to a clock speed of
19.27MHz forCoiflets1 wavelet as compared to other two wavelets. It has advantage for optimizing area and time. The architecture is
modular and cascadable for one or multi-dimensional DWT.
Keywords: DWT, FRA, hardware efficiency, six tap Fir Filter, Systolic Array
1. Introduction
In the last decade, there has been an enormous increase in
the applications of wavelets in various scientific disciplines
to relate the discrete-time filter bank with the theory of
continuous time function space. Typical applications of
wavelets include signal processing, image processing,
numerical analysis, statistics, biomedicine, etc.[1], [2].
Wavelet transform offers a wide variety of useful features, in
contrast to other transforms, such as Fourier transform or
cosine transform. Some of these are as follows:
• Computational complexity of O (N), where N is the
number of data samples [3]
• Efficient VLSI implementation [4]
• Low computational complexity, flexibility in representing
non stationary image signals
Redundancies in video sequence can be removed by using
Discrete Cosine Transform (DCT). DCT suffers from the
negative effects of blackness andmosquito noise resulting in
poor subjective quality of reconstructed images at high
compression [5].In order to meet the real time requirements
in many applications,design and implementation of DWT is
required [6], [7].
2. Discrete Wavelet Transform
DWT which is based onsub band coding,is fastcomputation
wavelet transform.It is easy to implement and reduces the
computation time and resources required.Wavelets can be
realized by iteration of filters with scaling.The resolution of
the signal,which is a measure of the amount of detail
information in the signal,is determined by the filtering
operations, and the scaleis determined by upsampling and
down sampling (subsampling) operations [2].
A schematic of three stage DWT decomposition is shown in
Figure1.
Figure 1: Three Stage DWT Decomposition using Pyramid
Algorithm
In Figure 1, the signal is denoted by the sequence a[n],where
n isan integer. The low pass filter is denoted by L1 while the
high pass filter is denoted by H1. At eachlevel,the high pass
filter produces detail information b[n], while the low
passfilterassociated with scaling function produces coarse
approximation,c[n].
Here the input signal a[n] has N samples.At the first
decomposition level,the signal is passed through the high
pass and low pass filters,followed by sub sampling by 2.The
output of the high pass filter has N/2 samples and b[n].These
N/2 samples constitute the first level of DWT
coefficients.The output of the low pass filter also has N/2
samples and c[n]. The signal is then passed through low pass
and high pass filters for furtherdecomposition. The output of
the second low pass filter followed bysub sampling has N/4
samples and e[n].The output of the second high pass filter
followed by subsampling has N/4 samples and d[n].The
second high pass filterconstitutes the second level ofDWT
coefficients.The low pass filter output is then filtered once
again for further decomposition and produces g[n],f[n] with
N/8 samples.The filtering and decimation process is
continued until the desired levelis reached.The maximum
number of levels depends on the length of the signal.
Paper ID: SUB1539 42
International Journal of Science and Research (IJSR)
ISSN (Online): 2319-7064
Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438
Volume 4 Issue 1, January 2015
www.ijsr.net
Licensed Under Creative Commons Attribution CC BY
3. Data Dependencies within DWT
The wavelet decomposition of a 1-D input signal for three
stages is shown in Figure 1. The transfer functions of the
sixth order high pass (g(n)) and low pass h(n)) filter can be
expressed as follows:
High(z) = go+g1z-1
+g2z-2
+g3z-3
+g4z-4
+g5z-5
(1a)
Low(z) = ho+h1z-1
+h2z-2
+h3z-3
+h4z-4
+h5z-5
(1b)
For clarity the intermediate and final DWT coefficientsin
Figure 1 are denoted by a,b,c,d,e,f,and g.The DWT
computations are complexbecause of the data dependencies
at different octaves.Equations 2a-2n shows the relationship
among a, b, c, d, e, f, and g.
1st
octave:
b(0) = g(0)a(0) + g(1)a(-1) + g(2)a(-2)
+ g(3)a(-3) + g(4)a(-4) +
g(5)a(-5) (2a)
b(2) = g(0)a(2) + g(1)a(1) + g(2)a(0) +
g(3)a(-1) + g(4)a(-2) + g(5)a(-
3) (2b)
b(4) = g(0)a(4) + g(1)a(3) + g(2)a(2) +
g(3)a(1) + g(4)a(0) + g(5)a(-1) (2c)
b(6) = g(0)a(6) + g(1)a(5) + g(2)a(4) +
g(3)a(3) + g(4)a(2) + g(5)a(1) (2d)
c(0) = h(0)a(0) + h(1)a(-1) + h(2)a(-2)
+ h(3)a(-3) + h(4)a(-4) +
h(5)a(-5)
(2e)
c(2) = h(0)a(2) + h(1)a(1) + h(2)a(0) +
h(3)a(-1) + h(4)a(-2) + h(5)a(-
3) (2f)
c(4) = h(0)a(4) + h(1)a(3) + h(2)a(2) +
h(3)a(1) + h(4)a(0) + h(5)a(-1) (2g)
c(6) = h(0)a(6) + h(1)a(5) + h(2)a(4) +
h(3)a(3) + h(4)a(2) + h(5)a(1) (2h)
2nd
octave:
d(0) = g(0)c(0) + g(1)c(-2) + g(2)c(-4)
+ g(3)c(-6) + g(4)c(-8) +
g(5)c(-10) (2i)
d(4) = g(0)c(4) + g(1)c(2) + g(2)c(0) +
g(3)c(-2) + g(4)c(-4) + g(5)c(-
6)
(2j)
e(0) = h(0)c(0) + h(1)c(-2) + h(2)c(-4)
+ h(3)c(-6) + h(4)c(-8) +
h(5)c(-10) (2k)
e(4) = h(0)c(4) + h(1)c(2) + h(2)c(0) +
h(3)c(-2) + h(4)c(-4) + h(5)c(-
6)
(2l)
3rd
octave:
f(0) = g(0)e(0) + g(1)e(-4) + g(2)e(-8) +
g(3)e(-12) + g(4)e(-16) + g(5)e(-20)
(2m)
g(0) = h(0)e(0) + h(1)e(-4) + h(2)e(-8) +
h(3)e(-12) + h(4)e(-16) + h(5)e(-20)
(2n)
4. Systolic Array (SA) Architecture for DWT
The design of DWT-SA is based on computation schedule
derived from Equations 2a-2n, which are the result of
applying the pyramid algorithm for N=8 to the sixstage
filter.
As shown in Equations 2a-2n there are eight first octave
coefficients computations, four second octave coefficients
and two third octave computations The DWT-SA
architecture comprisesof the following basic blocks:
4.1.The Filter Unit (FU)
4.2 The Storage Unit
4.3.The Control Unit (CU)
4.1. The Filter Unit (FU)
Equations 1a-1b show that the high pass and low pass filter
coefficients can be computed using a six stage multiply and
accumulate digital filter, where partial results are computed
at each stage separately and then progressively passed to the
next higher stage for accumulation. Equations 1a-1b
suggests that due to filter coefficient calculations spanning
over 5 clock cycles, a temporary storage is required for the
incoming data samples.
The filter unit proposed for this architecture is a six-stage
non-recursive FIR digitalfilter whose transfer functions for
the high-pass andlow-pass components are shown in
Equations 2a-2n.The systolic architecture of a six-stage filter
is shown in Figure-2.The latency of each filter stage is 1.
Figure 2: Systolic architecture of a six-stage filter
The filter behaves in a systolic manner by computing partial
products of more than one coefficient at a time. The first
multiply and accumulate stage computes the first partial
product and passes it to the second filter stage where it is
added to the second partial product. That repetitive action
continues until a complete DWT coefficient is out from the
sixth filter stage.
The filter consists of band select signal. If band selects is’1’
then it selects low pass coefficients. If band selects’0’then it
selects high pass coefficients.The partial results are passed
synchronously in a systolic manner from one cell to the
adjacent cell. The filter cell consist therefore of only one
multiplier, one adder and two registers to store high pass and
low pass coefficients.The RTL schematic of filter unit is
shown in Figure 3.
In filter cell, a signed number multiplication problem occurs.
The signed-number represents either positive, negative
numbers or one positive and other negative numbers. To
avoid this problem, the proposed filter cell consists of invert
and xor operation as shown in Figure 4.The RTL schematic
of filter cell is shown in Figure 5.
Paper ID: SUB1539 43
International Journal of Science and Research (IJSR)
ISSN (Online): 2319-7064
Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438
Volume 4 Issue 1, January 2015
www.ijsr.net
Licensed Under Creative Commons Attribution CC BY
Figure 3: The RTL schematic of filter unit
S_D
Dmul
Sadd
S_ Input S_ Hi_D S_L0_D
Previous
Input
Cout
Cin
Mux 13
Multiplier
Sm XOR S3 = Sa/s
Sadd
Mux 30 Comparator
30
Adder
Invert
Mux
S0_Input
Sm = S0 xor S
Cout Substractor
To the next stage
Figure 4: The Proposed Filter Cell
Figure 5: The RTL Schematic of Filter Cell
4.2 The Storage Unit
Two storage units are used in the proposed architecture.
• Input Delay Unit (ID) and
• Register Bank (RB)
Input Delay Unit (ID):
The data registers and input delay used in theses storage
units have been constructed from standard D-latch.Equations
2a-2n shows that the value of computed filter coefficient
depends on the present as well as the five previous data
samples. The negative time indexes in Equations 2
correspond to the reference starting time unit 0. Figure 6
shows the block diagram of the input delay unit.
Delay 1 Delay 2 Delay 3 Delay 4 Delay 5
To Switch 1 To Switch 2 To Switch 3 To Switch 4 To Switch 5 To Switch 6
Input
Figure 6: Input Delay Unit (ID)
As shown in figure five delays are connected serially.At any
clock cycle each delay passes its contents to its right
neighbor which results in only five past values being
retained.The inputof delayis applied to the switch.The RTL
schematic of input delay unit isshown inFigure 7.
Figure 7: The RTL Schematic of Input Delay Unit
Register Bank (RB):
In proposed architecture number of registers required is
26.Here, two 13 registers architecture are connected serially
and constitute one register bank and are controlled by an
overall clock.The output of one register is directly connected
to theinput of the adjacent register,and thusthe register bank
is implemented as shown in Figure 8and its RTL schematic
is shown in Figure 9.
Register 0 Register 1 Register 2 Register 25
Clock
From filter bank
To control unit
Figure 8: Register Bank (RB)
a(12:0)y(12:0)
b(12:0)
sel
mux13
x(12:0)mulout(30:0)
y(12:0)
muls1224
in1(30:0) out1(29:0)
U4 inv29
a(30:0)y(29:0)
b(29:0)
sel
U3
mux30
x(30:0)sadd
y(30:0)
U5
xors30
Hi_D0(12:0)
L0_D0(12:0)
Bandselect
Input(12:0)
PREV(30:0)
Output(30:0)
a(30:0)y
b(30:0)
comp313
s
addout(30:0)sadd
cout
x(29:0)
y(30:0)
adder311
ciny(30:0)
d(30:0)
coutsubt
a(30:0)y(30:0)
b(30:0)
sel
mux301
0014
1002
1002
1
1002
40000012
0009
40000012 3FFFFFED
40000012
00000012
3FFFFFED
0
40000012
0
40000042
1
0009
40000042
40000054
40000012
1
40000042
1
40000054
0
0
00000012
40000042
0
7FFFFFAC
40000054
40000054
40000054
7FFFFFAC
0
clk q(12:0)
d(12:0)
rst
U1
latch13
clk q(12:0)
d(12:0)
rst
U2
latch13
clk q(12:0)
d(12:0)
rst
U3
latch13
clk q(12:0)
d(12:0)
rst
U4
latch13
clk q(12:0)
d(12:0)
rst
U5
latch13
clk q(12:0)
d(12:0)
rst
U6
latch13
clk
rst
Output(12:0)
Input(12:0)
inout1(12:0)
inout2(12:0)
inout3(12:0)
inout4(12:0)
inout5(12:0)
clko
rst0
Paper ID: SUB1539 44
International Journal of Science and Research (IJSR)
ISSN (Online): 2319-7064
Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438
Volume 4 Issue 1, January 2015
www.ijsr.net
Licensed Under Creative Commons Attribution CC BY
Figure 9:The RTL Schematic of Register Bank (RB)
4.3 Control Unit
In DWT-SA architecture, schedule based on latency of 1 is
proposed to meet the real time requirements in some
applications.The computations are scheduled at the earliest
possible clock cycle,and computed output samples are
available one clock cycle after they have been scheduledas
shown in Table 1.The delay is minimized through the
pipeline facilitating real time operation.The computation
schedule in table corresponds to a high hardware utilization
of more than 85%.All the intermediate and the associated
periods of activity are listed in Table 2.
Table 1: Schedule for one complete set of computations
Cycles High-pass Low-pass
1 b(0) c(0)
2 -- --
3 b(2) c(2)
4 d(0) e(0)
5 b(4) c(4)
6 -- --
7 b(6) c(6)
8 d(4) e(4)
9 b(0) c(0)
10 f(0) Low-pass
Table 2: Activity periods for intermediate results
Sample Available Cycles Life Period
c(0) 1 1 to 12
c(2) 3 3 to 14
c(4) 5 5 to 16
c(6) 7 7 to 18
e(0) 12 12 to 38
e(4) 16 16 to 38
The complete design of the control unit for DWT-SA is
shown in Figure 10.The control unit uses switch, decoder,
and four FSM. The switching action is done by using FSM.
State diagram is used to represent FSM.CU directs data from
the Input Delay (ID) or the register bank (RB) to the filter
unit (FU). The CU multiplexes data from the ID every
second cycle, and from the RB in cycles 4, 6, and 8.In
cycle2,6, CU remains idle, i.e.it does not allow any passage
of data. Proper timing synchronization as well as enabling
and disabling of the CU are insured by the CLK signal. The
RTL schematic of control unit is shown in Figure 11.
FSM 0
FSM 1
Clk
Rst
Rst
Clk
Rst
Clk
FSM 3
FSM 2
Decoder
Y0
Y0
Y1
Y2
Y2
Y1
Y3
Y3
Clk
Rst
Clk
Rst Output
Figure 10: The Control Unit
Control logic consists of four FSM. The switch is operated
on this state diagram. According to that it accepts data from
input delay and register bank. Figure 12 shows the state
diagram for FSM0.Here, there are two states, S1 and
S2.When reset is 1, it produces output 1.Otherwise it
produces 0.
Figure 11: The RTL Schematic of Control Unit
s1
s2 y0<='0';
Rst ='1'
y0<='1';
Figure 12: State Diagram for FSM0
Figure 13 shows the state diagram for FSM1. It consists of
four states, S1, S2, S3, S4.When reset is 1, and the output is
0 for first clock. After that it goes to the next state, i.e. S2.At
that time clock is incremented by 1.When clock cycle is less
than 2,it produces output 0.But,when clock is greater than 2
then it goes to the next state. In S3 state, it produces output
1.This process is repeated for clock 3, 4, 5, 6, and 7. If clock
is greater than 7 then it again comes to the S3 state and
produces the output.
clk q(30:0)
d(30:0)
rst
clk q(30:0)
d(30:0)
rst
clk
rst
clk q(30:0)
d(30:0)
rst
clk q(30:0)
d(30:0)
rst
clk q(30:0)
d(30:0)
rst
clk q(30:0)
d(30:0)
rst
clk q(30:0)
d(30:0)
rst
clk q(30:0)
d(30:0)
rst
clk q(30:0)
d(30:0)
rst
clk q(30:0)
d(30:0)
rst
clk q(30:0)
d(30:0)
rst
clk q(30:0)
d(30:0)
rst
clk q(30:0)
d(30:0)
rst
Output1(30:0)
Output2(30:0)
Output3(30:0)
Output4(30:0)
Output5(30:0)
Output6(30:0)
Output7(30:0)
Output8(30:0)
Output9(30:0)
Output10(30:0)
Output11(30:0)
Output12(30:0)
Input(30:0)
clko
rsto
Output(30:0)
clk y0
rst
U1
fsm
clk y1
rst
U2
fsm1
clk y2
rst
U3
fsm2
clk y3
rst
U4
fsm3
y0 y(3:0)
y1
y2
y3
U5
cldecoder1
clk
Rst
Output(3:0)
0 0
0
0 0
0
0 1
0
0 0
0
0 2
0
1
0
0
0
2
Paper ID: SUB1539 45
International Journal of Science and Research (IJSR)
ISSN (Online): 2319-7064
Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438
Volume 4 Issue 1, January 2015
www.ijsr.net
Licensed Under Creative Commons Attribution CC BY
Figure 13: State Diagram for FSM1
Figure 14 shows state diagram for FSM2.When reset is
0,state S2 produces output 0.This occurs when clock is less
than 6.For the next clock i.e. Clock is greater than 6,the state
S3 produce output 1.But when clock is greater than 7,then
sate S4 changes its state to the previous state S3.
Figure 14: State Diagram for FSM2
The next is FSM3, which consist of S1, S2, S3, and S4.
Figure 15 shows the sate diagram for FSM3.When reset is 1,
output y3 produces 0.In sate S2, when clock cycle is less
than 8, then output is 0.But in the next clock cycle the output
is high. If clock cycle is greater than 7, it changes the state
S4 to S3 and produces output high. In all this FSM, outputs
are connected to decoder. The output of decoder is
connected to the select line of switch, by selecting particular
select line; switching action of switch is done. And
according to that switch, the data is applied to the filter cell.
Figure 15: State Diagram for FSM3
5. Register Allocation
In the DWT-SA architecture Control Unit (CU) and the
Register Bank (RB) synchronize the availability of
operands.There are two schemeswhich can be employed for
this purpose, namely;
• Forward Register Allocation (FRA)
• The Forward-Backward Register Allocation(FBRA)
The FRA method uses a set of registers which are allocated
to intermediate data on the first comefirst serve basis. It does
not reassign any registers to other operands once its contents
have beenaccessed.The FBRA scheme is similar, except that
once the operand stored has been used;there register is
allocated to another operand.The FRA method is
simpler,requires less control circuitry and permits easy
adaptation of this architecture for coefficient calculation of
more than 3 octaves.It results however in less
efficientregister utilization.In our work we used FRA
register allocation.
5.1 FRA Register Allocation
To demonstrate the construction of the register allocation
using the FRA approach,we will considerthe case of
computing the coefficients c(0) and c(2).As shown in Table
3coefficient c(0) is computed in clock cycle one.Whereas
coefficient c(2)is scheduled for computation in cycle
3.These two computed coefficients along with other
four;c(4),c(6),c(8),c(10) are needed for thecomputation
ofcoefficient(0).According to table e(0) computation is
scheduled for cycle 4.However the six coefficients c(0),c(2),
c(4),c(6),c(8),c(10) will not be available until cycle 12,and
therefore the calculation of e(0) has to be scheduled for
cycle 4+N,or 12.Similarly,coefficient e(4) is computed not
in cycle 8,but 8+N,thus cycle16.The number of registers
needed becomes apparent once one complete frame of
computations has been scheduled.
Table 3: FRA Register Allocation
Cycle Com R1 R2 R3 R4 R5 R6
1 c(0)
2 c(0)
3 c(2) c(0)
4 e(0) c(2) c(0)
5 c(4) c(2) c(0)
6 g(0) c(4) c(2) c(0)
7 c(6) c(4) c(2) c(0)
8 e(4) c(6) c(4) c(2)
9 c(0) c(6) c(4) c(2)
10 c(0) c(6) c(4)
11 c(2) c(0) c(6) c(4)
12 e(0) c(2) c(0) c(6)
13 c(4) e(0) c(2) c(0) c(6)
14 g(0) c(4) e(0) c(2) c(0)
15 c(6) c(4) e(0) c(2) c(0)
16 e(4) c(6) c(4) e(0) c(2)
17 c(0) e(4) c(6) c(4) e(0) c(2)
18 c(0) e(4) c(6) c(4) e(0)
19 c(2) c(0) e(4) c(6) c(4)
20 e(0) c(2) c(0) e(4) c(6)
21 c(4) e(0) c(2) c(0) e(4) c(6)
22 g(0) c(4) e(0) c(2) c(0) e(4)
Paper ID: SUB1539 46
International Journal of Science and Research (IJSR)
ISSN (Online): 2319-7064
Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438
Volume 4 Issue 1, January 2015
www.ijsr.net
Licensed Under Creative Commons Attribution CC BY
23 c(6) c(4) e(0) c(2) c(0)
24 e(4) c(6) c(4) e(0) c(2)
25 c(0) e(4) c(6) c(4) e(0) c(2)
26 c(0) e(4) c(6) c(4) e(0)
27 c(2) c(0) e(4) c(6) c(4)
28 e(0) c(2) c(0) e(4) c(6)
29 c(4) e(0) c(2) c(0) e(4) c(6)
30 g(0) c(4) e(0) c(2) c(0) e(4)
31 c(6) e(0) c(2) c(0)
32 e(4) c(6) e(0) c(2)
33 c(0) e(4) c(6) e(0) c(2)
34 e(4) c(6) e(0)
35 c(2) e(4) c(6)
36 e(0) c(2) e(4) c(6)
37 c(4) e(0) c(2) e(4) c(6)
38 g(0) c(4) e(0) c(2) e(0) e(4)
Table 3: FRA Register Allocation (continued)
Cycle R7 R8 R9 R10 R11 R12 R13
1
2
3
4
5
6
7
8 c(0)
9 c(0)
10 c(2) c(0)
11 c(2) c(0)
12 c(4) c(2) c(0)
13 c(4) c(2) c(0)
14 c(6) c(4) c(2)
15 c(6) c(4) c(2)
16 c(0) c(6) c(4)
17 c(0) c(6) c(4)
18 c(2) c(0) c(6)
19 e(0) c(2) c(0) c(6)
20 c(4) e(0) c(2) c(0)
21 c(4) e(0) c(2) c(0)
22 c(6) c(4) e(0) c(2)
23 e(4) c(6) c(4) e(0) c(2)
24 c(0) e(4) c(6) c(4) e(0)
25 c(0) e(4) c(6) c(4) e(0)
26 c(2) c(0) e(4) c(6)
27 e(0) c(2) c(0) e(4) c(6)
28 c(4) e(0) c(2) c(0) e(4)
29 c(4) e(0) c(2) c(0)
30 c(6) c(4) e(0) c(2)
31 e(4) c(6) c(4) e(0) c(2)
32 c(0) e(4) c(6) c(4) e(0)
33 c(0) e(4) c(6) e(0)
34 c(2) c(0) e(4) c(6)
35 e(0) c(2) c(0) e(4)
36 e(0) c(2) c(0) e(4)
37 e(0) c(2) e(4)
38 c(6) e(0) c(2)
6. DWT-SA Architecture
The proposed architecture is shown in Figure 16 and its RTL
schematic is shown inFigure 17.It is obtained by systematic
analysis of the FRA register allocation in Table 3in
conjunction with the filter equations 1a and 1a.Delayconsist
of the DWT-SA architecture consists of the latency period
necessary to fill up the filter for the first time, plus the delay
through the registers asdescribed in Table 1.The output
coefficients are obtained from the final filter stage. The
proposed DWT-SA architecture uses only one filter. The
architecture is optimized in hardware by using only a single
multiplier and adder set in each filter celltogenerate all high
pass and low pass coefficient.The coefficients which are
present at the output of filter are stored in register. The
control logic controls the switching action of switch.The
data which are coming from delay input and register bank is
controlled by switch.According to the position of switch,one
of the data is select and performs filtering operations. This
process is continued up to the 53 clock cycle.
Figure 16: DWT-SA Architecture
To select the particular output coming from filter unit, one
of the small unit is used. When select line is zero, it selects
one data. Force zero blocks is used to set the values zero.
For controlling of this unit FSM is used. The select line is
connected to the FSM. When reset is zero, it produces
output. Otherwise it produces 0.This results in a simple and
efficient systolic implementation for the computation of1-
DWT and hence is suitable for VLSI implementation.
Paper ID: SUB1539 47
International Journal of Science and Research (IJSR)
ISSN (Online): 2319-7064
Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438
Volume 4 Issue 1, January 2015
www.ijsr.net
Licensed Under Creative Commons Attribution CC BY
Figure 17: RTL schematic of DWT-SA architecture
The RTL view of DWT-SA for Daubechies3 is shown in
Figure 18.This systolic array architecture is designed and
implemented using quartus II by device:
‘Cyclone;EP2C20F484C7’.Here two elements are used to
implement DWT-SA architecture.We get the same RTL
view for Haar and Coiflets wavelets
Figure 18: RTL view of DWT-SA for Daubechies3 wavelet
7. Simulation Results of DWT-SA Architecture
for Daubechies3, HaarandCoiflets1Wavelet
MATLAB is the software which is used in our project to
verify the result with VHDL simulation. The high pass and
low pass coefficients are calculated from MATLAB which
are in decimal form. We convert these into binary and
padding some zeroes in it. Because the filter cell and
multiplexer is designed for31 bit, 13 bit respectively. Out of
this13 bit, 12 bits are data bits and 1 MSB bit is signed bit.
The signed bit represents the positive or negative number.
All the implementation and simulation is done in Active
HDL7.1. Simulated waveform gives result for different band
select. When band select is ‘1’ the low pass coifficients are
selected and gives Approximation coefficients. When band
select is ‘0’ the high pass coifficients are selected and gives
Details coefficients. The result of DWT-SA is in
hexadecimal format.
Table 4 shows low pass coefficients of daubechies3,haar and
coiflets1 wavelet in hexadecimal form.
Table 4: Low pass Coifficients of db3, haar and coiflets1
wavelet in Hex form
Low pass
coefficients
db3
wavelet
Haar
wavelet
coiflets1
wavelet
LO_D0 0001 0012 0000
LO_D1 1002 0012 1002
LO_D2 1003 0 000A
LO_D3 000B 0 0015
LO_D4 0014 0 0008
LO_D5 0008 0 1002
Table 5, Table 6 and Table 7 shows the result in terms of
approximation coefficient in hexadecimal form of
daubechies3,haar and coiflets1 wavelet.
Table 5: Approximation Coefficients of db3wavelet
When band select is ‘1’
Approximation coefficients of db3
First stage output Second stage output Third stage output
ca1 ca11 ca111
-1 9 -54
7 -42 383
23 161 -6C9
24 51C 7AD8
1C E0 700
0 0 0
Table 6: Approximation Coefficients of Haar wavelet
When band select is ‘1’
Approximation coefficients of Haar
First stage output Second stage output Third stage output
ca1 ca11 ca111
24 510 88B0
24 288
Table 7: Approximation Coefficients of coiflets1 wavelet
When band select is ‘1’
Approximation coefficients of coiflets1
First stage output Second stage output Third stage output
ca1 ca11 ca111
-2 4 -8
1D B2 -31A
23 531 8065
25 160 -5E
6 -C 18
0 0 0
Table 8 shows high pass coefficients ofdaubechies3,haar and
coiflets1 wavelet in hexadecimal form.
Table 8: High pass coefficients of db3, haar and coiflets1
wavelet in Hex form
High pass
coefficients
db3
wavelet
Haar wavelet
coiflets1
wavelet
HO_D0 1008 1012 0002
HO_D1 0014 0012 0008
HO_D2 100B 0 1015
HO_D3 1003 0 000A
HO_D4 0002 0 0002
HO_D5 0001 0 0000
When band select is ‘0’ the high pass coifficients are
selected and gives Details coefficients.Table 9, Table 10 and
Table 11 shows the result in terms of detail coefficients in
Bandselect
Output(30:0)
D0(12:0)
D1(12:0)
D2(12:0)
PREV(30:0)
U1
filtercell1
Lo_D0(12:0)
Hi_D0(12:0)
bandselect
Prev(30:0)
Bandselect
Output(30:0)
D0(12:0)
D1(12:0)
D2(12:0)
PREV(30:0)
U2
filtercell1
Lo_D1(12:0)
Hi_D1(12:0)
Bandselect
Output(30:0)
D0(12:0)
D1(12:0)
D2(12:0)
PREV(30:0)
U3
filtercell1
Lo_D2(12:0)
Hi_D2(12:0)
Bandselect
Output(30:0)
D0(12:0)
D1(12:0)
D2(12:0)
PREV(30:0)
U4
filtercell1
Lo_D3(12:0)
Hi_D3(12:0)
Bandselect
Output(30:0)
D0(12:0)
D1(12:0)
D2(12:0)
PREV(30:0)
U5
filtercell1
Lo_D4(12:0)
Hi_D4(12:0)
Bandselect
Output(30:0)
D0(12:0)
D1(12:0)
D2(12:0)
PREV(30:0)
U6
filtercell1
Lo_D5(12:0)
Hi_D5(12:0)
clk
q(12:0)d(12:0)
rst
U7
latch13
clk
q(12:0)d(12:0)
rst
U8
latch13
clk
q(12:0)d(12:0)
rst
U9
latch13
clk
q(12:0)d(12:0)
rst
U10
latch13
clk
q(12:0)d(12:0)
rst
U11
latch13
rst
clk
clk
Output(30:0)filterin(30:0)R0(30:0)
rst
R1(30:0)
R10(30:0)
R12(30:0)
R2(30:0)
R3(30:0)
R4(30:0)
R5(30:0)
R6(30:0)
R7(30:0)
R8(30:0)
R9(30:0)
U12
regbank1
clk
Output(30:0)filterin(30:0)
R0(30:0)
rst
R1(30:0)
R10(30:0)
R12(30:0)
R2(30:0)
R3(30:0)
R4(30:0)
R5(30:0)
R6(30:0)
R7(30:0)
R8(30:0)
R9(30:0)
U13
regbank1
a(12:0)
y(12:0)
b(30:0)
c(30:0)
d(30:0)
sel(3:0)
Rst
Output(3:0)
clk
U19
dwtcl2
a(12:0)
y(12:0)
b(30:0)
c(30:0)
d(30:0)
sel(3:0)
U23
sw3
U14
sw3
a(12:0)
y(12:0)
b(30:0)
c(30:0)
d(30:0)
sel(3:0)
U15
sw3
a(12:0)
y(12:0)
b(30:0)
c(30:0)
d(30:0)
sel(3:0)
U16
sw3
a(12:0)
y(12:0)
b(30:0)
c(30:0)
d(30:0)
sel(3:0)
U17
sw3
a(12:0)
y(12:0)b(30:0)
c(30:0)
d(30:0)
sel(3:0)
U18
sw3
clk datain(12:0)
rstin rstout
U20
fsmforin1
a(30:0)y(30:0)
sel
U24
forcezero1
clkfzc
rst
U25
forcezerofsm1
outsq(30:0) out0(30:0)
sel(3:0) out1(30:0)
out2(30:0)
U21
demux301
stage0(30:0)
stage1(30:0)
stage2(30:0)
1
00000000
0000 00011008
00000000
00011008
1
00000000
1
00000000
0000 10020014
00000000
10020014
1
00000000
0000 1003100B
00000000
1003100B
1
00000000
0000 000B1003
00000000
000B1003
1
00000000
0000 00140002
00000000
00140002
1
00000000
0000 00080001
00000000
00080001
0
0001
0000
0 0
00010001
0 0
00010001
0 0
00010001
0 0
00010001
0
0
0
0
00000000
00000000
00000023
0
00000009
00000000
00000000
00000007
00000000
40000001
00000000
00000000
00000000
00000000
00000000
0
00000000
00000000
00000000
0
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
00000000
0001
0000
000000070000000700000000
4
0
4
0
0000
0000
00000023
00000023 00000009
4
0001
0000
400000014000000100000000
4
0001
0000
000000000000000000000000
4
0001
0000
000000000000000000000000
4
0001
0000
000000000000000000000000
4
0 0000
0 0
00000000
00000000
0
0
0
0
00000000 00000000
4 00000000
00000000
00000000
00000000
00000000
Paper ID: SUB1539 48
International Journal of Science and Research (IJSR)
ISSN (Online): 2319-7064
Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438
Volume 4 Issue 1, January 2015
www.ijsr.net
Licensed Under Creative Commons Attribution CC BY
hexadecimal form of daubechies3, haar and coiflets1
wavelet in hexadecimal form.
Table 9: Details coefficients of db3 wavelet
When band select is ‘0’
Details coefficients of db3
First stage output Second stage output Third stage output
cd1 cd11 cd111
C 100 1110
-2 05E 086E
1 0BA 114
-B -1E 75
3 3 3
0 0 0
Table 10: Details coefficients of Haar wavelet
When band select is ‘0’
Details coefficients of Haar
First stage output Second stage output Third stage output
cd1 cd11 cd111
0 0 0
0 0 0
Table-11:Details coefficients of coiflets1 wavelet
When band select is ‘0’
Details coefficients of coiflets1
First stage output Second stage output Third stage output
cd1 cd11 cd111
-A 4E 34E
-1 6F 9D
1 D5 906
-9 2 4
2 0 0
0 0 0
The Systolic array architecture of DWT is synthesized,
placed and routed for cyclone device EP2C20F484C7. The
levels of logic cells reported are 8% for Daubechies3, 5%
for Haar, 2% for coiflets1. The minimum clock period
reported by the synthesis tool after placing and routing is
given in Table 12. The Area Analysis reports of each
wavelet for Discrete Wavelet Transform Systolic array
architecture implementation are given in Table 13.
Table 12: Time Analysis for device EP2C20F484C7
Parameters Daubechies3 haar Coiflet1
Clock Period(ns) 53.508 ns 53.743 ns 51.892 ns
Clock
Speed(MHz)
18.69
MHz 18.61 MHz 19.27 MHz
Table 13: Area Analysis for device EP2C20F484C7
Parameters Daubechies3 haar Coiflet1
Total Logic Elements 1515, 8% 955,5% 1407,2%
Total Combinational
Functions 1320, 7% 907,5% 1212,6%
Dedicated Logic
Registers 336, 2% 134,1% 336, 2%
LUT-only LCs 1179(1) 821(0) 1071(1)
Register-Only LCs 195(0) 48(0) 195(0)
LUT/Register LCs 141(0) 86(0) 141(0)
For the verification of VLSI results, MATLAB is used. The
approximation and detail coefficient are obtained from
MATLAB.Figure 19shows the simulation waveform for
approximation coefficients and Figure 20shows the
simulation waveform for details coefficients for
Daubechies3 wavelet.
Figure 19: Simulation result of Approximation coefficients
for Daubechies3
Figure 20: Simulation result of Details coefficients for
Daubechies3
Figure 21 and Figure 22 shows the simulation waveform for
approximation coefficients and details coefficients of Haar
wavelet.
Paper ID: SUB1539 49
International Journal of Science and Research (IJSR)
ISSN (Online): 2319-7064
Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438
Volume 4 Issue 1, January 2015
www.ijsr.net
Licensed Under Creative Commons Attribution CC BY
Figure 21: Simulation result of Approximation coefficients
for Haar
Also, Figure 23 and Figure 24shows the simulation
waveform for approximation coefficients and details
coefficients of Coiflets1 wavelet.
Figure 22: Simulation result of Details coefficients for Haar
Figure 23: Simulation result of Approximation coefficients
for Coiflets1
Figure 24: Simulation result of Details coefficients for
Coiflets1
8. Conclusion
The proposed DWT-SA architecture computes N
coefficients in N clock cycles and achieves real time
operation by executing computations of higher octave
coefficients in between the first octave coefficient
computations. The first octave computations are scheduled
for every N/4cycles, while the second and third octaves are
scheduled every N/2 and N clock cycles respectively.
The main objective of this work is to develop a design
resource for designers to implement systolic array
architecture of DWT for various wavelets according to the
design need such as clock speed, area and time. Systolic
array architecture has efficient hardware utilization and it
works with data streams of arbitrary size. The design is
cascadable, for computation of one, two and three
decomposition level. Systolic array Architecture is
implemented using Very High Speed Integrated Circuit
Paper ID: SUB1539 50
International Journal of Science and Research (IJSR)
ISSN (Online): 2319-7064
Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438
Volume 4 Issue 1, January 2015
www.ijsr.net
Licensed Under Creative Commons Attribution CC BY
(VHSIC) Hardware Description Language (VHDL) and then
are simulate using Active HDL.
References
[1] N. Jayant, "High Quality Networking of Audio-Visual
Information", IEEE Communications Magazine, pp.
84-95, September 1993.
[2] E. A. Fox, "Advances in Interactive Digital Multimedia
Systems", IEEE Computer Magazine, pp. 9-21, October
1991.
[3] P. H. Ang, P. A. Ruetz and D. Auld, "Video
Compression Makes Big Gains", IEEE Spectrum
Magazine, pp. 16-19, October 1991.
[4] A. K. Jain, "Image Data Compression: A Review",
Proceedings of the IEEE, Vol. 69, No. 3, pp. 349- 389,
March 1981.
[5] K. R. Rao and P. Yip, Discrete Cosine Transform -
Algorithms, Advantages, Applications. Academic Press:
San Diego, 1990.
[6] H. G. Musmann, P. Pirsch and H. J. Grailerr, "Advances
in Picture Coding", Proceedings of the 1EEE, Vol. 73,
No. 4, pp. 523-548, April 1985.
[7] A. bl. Netravali and B. G. Haskell, Digital Pictures,
Plenum: New York, 1989.
[8] FBI Systems Technology Unit: "Gray Scale Fingerprint
image Compression Status Report, Nov. 4 1991.
[9] Chao-Tsung Huang, Po-Chih Tseng, and Liang-Gee
Chen,” Analysis and VLSI Architecture for 1-D and 2-
D Discrete Wavelet Transform”, IEEE Transactions on
signal processing, vol. 53, No. 4, April 2005
[10]Chih-Chi Cheng, Chao-Tsung Huang, Ching-Yeh Chen,
Chung-JrLian, and Liang-Gee Chen,” On-Chip Memory
Optimization Scheme for VLSI Implementation of
Line-Based Two-Dimentional Discrete Wavelet
Transform”, IEEE Transactions on circuits and systems
for video technology, vol. 17, no. 7, July 2007
[11]XinTian, Lin Wu, Yi-Hua Tan, and Jin-Wen Tian,”
Efficient Multi-Input/Multi-Output VLSI Architecture
for Two-Dimensional Lifting-Based Discrete Wavelet
Transform”, IEEE transactions on computers, vol. 60,
no. 8, August 2011
[12]Sze-Wei Lee, Soon-Chieh Lim,” VLSI Design of a
Wavelet Processing Core”, IEEE transactions on
circuits and systems for video technology, vol. 16, no.
11, November 2006
[13]Chao Cheng, Keshab K. Parhi,” High-Speed VLSI
Implementation of 2-D Discrete Wavelet Transform”,
IEEE transactions on signal processing, vol. 56, no. 1,
January 2008
Author Profile
Ms. RashmiPatil received her B. Eng. Degree in Electronics &
Communication from S.S.G.M.C.E. Shegaon, India in 2010 and
M.Tech degree in Electronics from R.T.M.N.U. Nagpur, India.
Currently she is a research scholar in B.D.C.O.E., Sevagram, India.
Her area of interest is applied electronics, VLSI, VHDL, and Low
Power optimization.
Dr. M. T. Kolte has completed his B.Tech, M.E., and Ph.D in
Electronics and Telecommunication. He is working as Head of
Dept.in M.I.T.C.O.E., Pune, India. He has presented and published
many papers in National and International Conference.
Paper ID: SUB1539 51

More Related Content

What's hot

Hamming net based Low Complexity Successive Cancellation Polar Decoder
Hamming net based Low Complexity Successive Cancellation Polar DecoderHamming net based Low Complexity Successive Cancellation Polar Decoder
Hamming net based Low Complexity Successive Cancellation Polar DecoderRSIS International
 
Developing digital signal clustering method using local binary pattern histog...
Developing digital signal clustering method using local binary pattern histog...Developing digital signal clustering method using local binary pattern histog...
Developing digital signal clustering method using local binary pattern histog...IJECEIAES
 
Ecg signal compression for diverse transforms
Ecg signal compression for diverse transformsEcg signal compression for diverse transforms
Ecg signal compression for diverse transformsAlexander Decker
 
FPGA IMPLEMENTATION OF EFFICIENT VLSI ARCHITECTURE FOR FIXED POINT 1-D DWT US...
FPGA IMPLEMENTATION OF EFFICIENT VLSI ARCHITECTURE FOR FIXED POINT 1-D DWT US...FPGA IMPLEMENTATION OF EFFICIENT VLSI ARCHITECTURE FOR FIXED POINT 1-D DWT US...
FPGA IMPLEMENTATION OF EFFICIENT VLSI ARCHITECTURE FOR FIXED POINT 1-D DWT US...VLSICS Design
 
Performance Analysis of OFDM Transceiver with Folded FFT and LMS Filter
Performance Analysis of OFDM Transceiver with Folded FFT and LMS FilterPerformance Analysis of OFDM Transceiver with Folded FFT and LMS Filter
Performance Analysis of OFDM Transceiver with Folded FFT and LMS Filteridescitation
 
A HYBRID DENOISING APPROACH FOR SPECKLE NOISE REDUCTION IN ULTRASONIC B-MODE ...
A HYBRID DENOISING APPROACH FOR SPECKLE NOISE REDUCTION IN ULTRASONIC B-MODE ...A HYBRID DENOISING APPROACH FOR SPECKLE NOISE REDUCTION IN ULTRASONIC B-MODE ...
A HYBRID DENOISING APPROACH FOR SPECKLE NOISE REDUCTION IN ULTRASONIC B-MODE ...csijjournal
 
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...CSCJournals
 
Reduced Complexity Transfer Function Computation for Complex Indoor Channels ...
Reduced Complexity Transfer Function Computation for Complex Indoor Channels ...Reduced Complexity Transfer Function Computation for Complex Indoor Channels ...
Reduced Complexity Transfer Function Computation for Complex Indoor Channels ...Ramoni Adeogun, PhD
 
64 point fft chip
64 point fft chip64 point fft chip
64 point fft chipShalyJ
 
Analysis of Microstrip Finger on Bandwidth of Interdigital Band Pass Filter u...
Analysis of Microstrip Finger on Bandwidth of Interdigital Band Pass Filter u...Analysis of Microstrip Finger on Bandwidth of Interdigital Band Pass Filter u...
Analysis of Microstrip Finger on Bandwidth of Interdigital Band Pass Filter u...IJREST
 
Radio Signal Classification with Deep Neural Networks
Radio Signal Classification with Deep Neural NetworksRadio Signal Classification with Deep Neural Networks
Radio Signal Classification with Deep Neural NetworksKachi Odoemene
 
Reducting Power Dissipation in Fir Filter: an Analysis
Reducting Power Dissipation in Fir Filter: an AnalysisReducting Power Dissipation in Fir Filter: an Analysis
Reducting Power Dissipation in Fir Filter: an AnalysisCSCJournals
 
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...csandit
 
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...cscpconf
 

What's hot (20)

F43063841
F43063841F43063841
F43063841
 
Hamming net based Low Complexity Successive Cancellation Polar Decoder
Hamming net based Low Complexity Successive Cancellation Polar DecoderHamming net based Low Complexity Successive Cancellation Polar Decoder
Hamming net based Low Complexity Successive Cancellation Polar Decoder
 
Developing digital signal clustering method using local binary pattern histog...
Developing digital signal clustering method using local binary pattern histog...Developing digital signal clustering method using local binary pattern histog...
Developing digital signal clustering method using local binary pattern histog...
 
Ecg signal compression for diverse transforms
Ecg signal compression for diverse transformsEcg signal compression for diverse transforms
Ecg signal compression for diverse transforms
 
CINAF protocol
CINAF protocolCINAF protocol
CINAF protocol
 
FPGA IMPLEMENTATION OF EFFICIENT VLSI ARCHITECTURE FOR FIXED POINT 1-D DWT US...
FPGA IMPLEMENTATION OF EFFICIENT VLSI ARCHITECTURE FOR FIXED POINT 1-D DWT US...FPGA IMPLEMENTATION OF EFFICIENT VLSI ARCHITECTURE FOR FIXED POINT 1-D DWT US...
FPGA IMPLEMENTATION OF EFFICIENT VLSI ARCHITECTURE FOR FIXED POINT 1-D DWT US...
 
Performance Analysis of OFDM Transceiver with Folded FFT and LMS Filter
Performance Analysis of OFDM Transceiver with Folded FFT and LMS FilterPerformance Analysis of OFDM Transceiver with Folded FFT and LMS Filter
Performance Analysis of OFDM Transceiver with Folded FFT and LMS Filter
 
paper
paperpaper
paper
 
A HYBRID DENOISING APPROACH FOR SPECKLE NOISE REDUCTION IN ULTRASONIC B-MODE ...
A HYBRID DENOISING APPROACH FOR SPECKLE NOISE REDUCTION IN ULTRASONIC B-MODE ...A HYBRID DENOISING APPROACH FOR SPECKLE NOISE REDUCTION IN ULTRASONIC B-MODE ...
A HYBRID DENOISING APPROACH FOR SPECKLE NOISE REDUCTION IN ULTRASONIC B-MODE ...
 
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
A Combined Voice Activity Detector Based On Singular Value Decomposition and ...
 
Reduced Complexity Transfer Function Computation for Complex Indoor Channels ...
Reduced Complexity Transfer Function Computation for Complex Indoor Channels ...Reduced Complexity Transfer Function Computation for Complex Indoor Channels ...
Reduced Complexity Transfer Function Computation for Complex Indoor Channels ...
 
64 point fft chip
64 point fft chip64 point fft chip
64 point fft chip
 
Es25893896
Es25893896Es25893896
Es25893896
 
Analysis of Microstrip Finger on Bandwidth of Interdigital Band Pass Filter u...
Analysis of Microstrip Finger on Bandwidth of Interdigital Band Pass Filter u...Analysis of Microstrip Finger on Bandwidth of Interdigital Band Pass Filter u...
Analysis of Microstrip Finger on Bandwidth of Interdigital Band Pass Filter u...
 
D0341015020
D0341015020D0341015020
D0341015020
 
Radio Signal Classification with Deep Neural Networks
Radio Signal Classification with Deep Neural NetworksRadio Signal Classification with Deep Neural Networks
Radio Signal Classification with Deep Neural Networks
 
Gt3612201224
Gt3612201224Gt3612201224
Gt3612201224
 
Reducting Power Dissipation in Fir Filter: an Analysis
Reducting Power Dissipation in Fir Filter: an AnalysisReducting Power Dissipation in Fir Filter: an Analysis
Reducting Power Dissipation in Fir Filter: an Analysis
 
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...
Performance evaluations of grioryan fft and cooley tukey fft onto xilinx virt...
 
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...
PERFORMANCE EVALUATIONS OF GRIORYAN FFT AND COOLEY-TUKEY FFT ONTO XILINX VIRT...
 

Viewers also liked

τομεας πληροφορικής12
τομεας πληροφορικής12τομεας πληροφορικής12
τομεας πληροφορικής1210epal-athin
 
Interpolation wikipedia
Interpolation   wikipediaInterpolation   wikipedia
Interpolation wikipediahort34
 
βοηθών βρεφονηπιοκόμων
βοηθών βρεφονηπιοκόμωνβοηθών βρεφονηπιοκόμων
βοηθών βρεφονηπιοκόμων10epal-athin
 

Viewers also liked (19)

Sub1527
Sub1527Sub1527
Sub1527
 
Environmental Awareness in Motor Vehicle Garages in the City of Mbeya, Tanzania
Environmental Awareness in Motor Vehicle Garages in the City of Mbeya, TanzaniaEnvironmental Awareness in Motor Vehicle Garages in the City of Mbeya, Tanzania
Environmental Awareness in Motor Vehicle Garages in the City of Mbeya, Tanzania
 
Sub1573
Sub1573Sub1573
Sub1573
 
Sub1594
Sub1594Sub1594
Sub1594
 
Sub1559
Sub1559Sub1559
Sub1559
 
Sub1599
Sub1599Sub1599
Sub1599
 
Sub1590
Sub1590Sub1590
Sub1590
 
τομεας πληροφορικής12
τομεας πληροφορικής12τομεας πληροφορικής12
τομεας πληροφορικής12
 
Interpolation wikipedia
Interpolation   wikipediaInterpolation   wikipedia
Interpolation wikipedia
 
Sub1562
Sub1562Sub1562
Sub1562
 
Project
ProjectProject
Project
 
Sub1567
Sub1567Sub1567
Sub1567
 
A Rare Case of Lingual Thyroid with Hypothyroidism
A Rare Case of Lingual Thyroid with HypothyroidismA Rare Case of Lingual Thyroid with Hypothyroidism
A Rare Case of Lingual Thyroid with Hypothyroidism
 
Sub1578
Sub1578Sub1578
Sub1578
 
Sub14176
Sub14176Sub14176
Sub14176
 
Nos
NosNos
Nos
 
βοηθών βρεφονηπιοκόμων
βοηθών βρεφονηπιοκόμωνβοηθών βρεφονηπιοκόμων
βοηθών βρεφονηπιοκόμων
 
Sub1582
Sub1582Sub1582
Sub1582
 
Piezoelectric Diesel Injectors & Emission Control
Piezoelectric Diesel Injectors & Emission ControlPiezoelectric Diesel Injectors & Emission Control
Piezoelectric Diesel Injectors & Emission Control
 

Similar to Sub1539

Efficient FPGA implementation of high speed digital delay for wideband beamfor...
Efficient FPGA implementation of high speed digital delay for wideband beamfor...Efficient FPGA implementation of high speed digital delay for wideband beamfor...
Efficient FPGA implementation of high speed digital delay for wideband beamfor...journalBEEI
 
Analysis of different FIR Filter Design Method in terms of Resource Utilizati...
Analysis of different FIR Filter Design Method in terms of Resource Utilizati...Analysis of different FIR Filter Design Method in terms of Resource Utilizati...
Analysis of different FIR Filter Design Method in terms of Resource Utilizati...ijsrd.com
 
IRJET- VLSI Architecture for Reversible Radix-2 FFT Algorithm using Programma...
IRJET- VLSI Architecture for Reversible Radix-2 FFT Algorithm using Programma...IRJET- VLSI Architecture for Reversible Radix-2 FFT Algorithm using Programma...
IRJET- VLSI Architecture for Reversible Radix-2 FFT Algorithm using Programma...IRJET Journal
 
Zero rotation aproach for droop improvement in
Zero rotation aproach for droop improvement inZero rotation aproach for droop improvement in
Zero rotation aproach for droop improvement ineSAT Publishing House
 
4 ijaems nov-2015-4-fsk demodulator- case study of pll application
4 ijaems nov-2015-4-fsk demodulator- case study of pll application4 ijaems nov-2015-4-fsk demodulator- case study of pll application
4 ijaems nov-2015-4-fsk demodulator- case study of pll applicationINFOGAIN PUBLICATION
 
Design And Implementation of Combined Pipelining and Parallel Processing Arch...
Design And Implementation of Combined Pipelining and Parallel Processing Arch...Design And Implementation of Combined Pipelining and Parallel Processing Arch...
Design And Implementation of Combined Pipelining and Parallel Processing Arch...VLSICS Design
 
DESIGN AND IMPLEMENTATION OF COMBINED PIPELINING AND PARALLEL PROCESSING ARCH...
DESIGN AND IMPLEMENTATION OF COMBINED PIPELINING AND PARALLEL PROCESSING ARCH...DESIGN AND IMPLEMENTATION OF COMBINED PIPELINING AND PARALLEL PROCESSING ARCH...
DESIGN AND IMPLEMENTATION OF COMBINED PIPELINING AND PARALLEL PROCESSING ARCH...VLSICS Design
 
DSP_2018_FOEHU - Lec 05 - Digital Filters
DSP_2018_FOEHU - Lec 05 - Digital FiltersDSP_2018_FOEHU - Lec 05 - Digital Filters
DSP_2018_FOEHU - Lec 05 - Digital FiltersAmr E. Mohamed
 
Design of Optimized FIR Filter Using FCSD Representation
Design  of  Optimized  FIR  Filter  Using  FCSD Representation Design  of  Optimized  FIR  Filter  Using  FCSD Representation
Design of Optimized FIR Filter Using FCSD Representation IJEEE
 
Design of all digital phase locked loop
Design of all digital phase locked loopDesign of all digital phase locked loop
Design of all digital phase locked loopeSAT Publishing House
 
IRJET- Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
IRJET-  	  Implementation of Reversible Radix-2 FFT VLSI Architecture using P...IRJET-  	  Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
IRJET- Implementation of Reversible Radix-2 FFT VLSI Architecture using P...IRJET Journal
 
Digital Implementation of Costas Loop with Carrier Recovery
Digital Implementation of Costas Loop with Carrier RecoveryDigital Implementation of Costas Loop with Carrier Recovery
Digital Implementation of Costas Loop with Carrier RecoveryIJERD Editor
 
Design of all digital phase locked loop (d pll) with fast acquisition time
Design of all digital phase locked loop (d pll) with fast acquisition timeDesign of all digital phase locked loop (d pll) with fast acquisition time
Design of all digital phase locked loop (d pll) with fast acquisition timeeSAT Journals
 
DESIGN AND IMPLEMENTATION OF COMBINED PIPELINING AND PARALLEL PROCESSING ARCH...
DESIGN AND IMPLEMENTATION OF COMBINED PIPELINING AND PARALLEL PROCESSING ARCH...DESIGN AND IMPLEMENTATION OF COMBINED PIPELINING AND PARALLEL PROCESSING ARCH...
DESIGN AND IMPLEMENTATION OF COMBINED PIPELINING AND PARALLEL PROCESSING ARCH...VLSICS Design
 
DESIGN AND IMPLEMENTATION OF COMBINED PIPELINING AND PARALLEL PROCESSING ARCH...
DESIGN AND IMPLEMENTATION OF COMBINED PIPELINING AND PARALLEL PROCESSING ARCH...DESIGN AND IMPLEMENTATION OF COMBINED PIPELINING AND PARALLEL PROCESSING ARCH...
DESIGN AND IMPLEMENTATION OF COMBINED PIPELINING AND PARALLEL PROCESSING ARCH...VLSICS Design
 
Transpose Form Fir Filter Design for Fixed and Reconfigurable Coefficients
Transpose Form Fir Filter Design for Fixed and Reconfigurable CoefficientsTranspose Form Fir Filter Design for Fixed and Reconfigurable Coefficients
Transpose Form Fir Filter Design for Fixed and Reconfigurable CoefficientsIRJET Journal
 
IRJET- Filter Design for Educational Set Via Labview Software Program
IRJET- Filter Design for Educational Set Via Labview Software ProgramIRJET- Filter Design for Educational Set Via Labview Software Program
IRJET- Filter Design for Educational Set Via Labview Software ProgramIRJET Journal
 
Area, Delay and Power Comparison of Adder Topologies
Area, Delay and Power Comparison of Adder TopologiesArea, Delay and Power Comparison of Adder Topologies
Area, Delay and Power Comparison of Adder TopologiesVLSICS Design
 
Area Efficient 9/7 Wavelet Coefficient based 2-D DWT using Modified CSLA Tech...
Area Efficient 9/7 Wavelet Coefficient based 2-D DWT using Modified CSLA Tech...Area Efficient 9/7 Wavelet Coefficient based 2-D DWT using Modified CSLA Tech...
Area Efficient 9/7 Wavelet Coefficient based 2-D DWT using Modified CSLA Tech...IRJET Journal
 

Similar to Sub1539 (20)

Aw4102359364
Aw4102359364Aw4102359364
Aw4102359364
 
Efficient FPGA implementation of high speed digital delay for wideband beamfor...
Efficient FPGA implementation of high speed digital delay for wideband beamfor...Efficient FPGA implementation of high speed digital delay for wideband beamfor...
Efficient FPGA implementation of high speed digital delay for wideband beamfor...
 
Analysis of different FIR Filter Design Method in terms of Resource Utilizati...
Analysis of different FIR Filter Design Method in terms of Resource Utilizati...Analysis of different FIR Filter Design Method in terms of Resource Utilizati...
Analysis of different FIR Filter Design Method in terms of Resource Utilizati...
 
IRJET- VLSI Architecture for Reversible Radix-2 FFT Algorithm using Programma...
IRJET- VLSI Architecture for Reversible Radix-2 FFT Algorithm using Programma...IRJET- VLSI Architecture for Reversible Radix-2 FFT Algorithm using Programma...
IRJET- VLSI Architecture for Reversible Radix-2 FFT Algorithm using Programma...
 
Zero rotation aproach for droop improvement in
Zero rotation aproach for droop improvement inZero rotation aproach for droop improvement in
Zero rotation aproach for droop improvement in
 
4 ijaems nov-2015-4-fsk demodulator- case study of pll application
4 ijaems nov-2015-4-fsk demodulator- case study of pll application4 ijaems nov-2015-4-fsk demodulator- case study of pll application
4 ijaems nov-2015-4-fsk demodulator- case study of pll application
 
Design And Implementation of Combined Pipelining and Parallel Processing Arch...
Design And Implementation of Combined Pipelining and Parallel Processing Arch...Design And Implementation of Combined Pipelining and Parallel Processing Arch...
Design And Implementation of Combined Pipelining and Parallel Processing Arch...
 
DESIGN AND IMPLEMENTATION OF COMBINED PIPELINING AND PARALLEL PROCESSING ARCH...
DESIGN AND IMPLEMENTATION OF COMBINED PIPELINING AND PARALLEL PROCESSING ARCH...DESIGN AND IMPLEMENTATION OF COMBINED PIPELINING AND PARALLEL PROCESSING ARCH...
DESIGN AND IMPLEMENTATION OF COMBINED PIPELINING AND PARALLEL PROCESSING ARCH...
 
DSP_2018_FOEHU - Lec 05 - Digital Filters
DSP_2018_FOEHU - Lec 05 - Digital FiltersDSP_2018_FOEHU - Lec 05 - Digital Filters
DSP_2018_FOEHU - Lec 05 - Digital Filters
 
Design of Optimized FIR Filter Using FCSD Representation
Design  of  Optimized  FIR  Filter  Using  FCSD Representation Design  of  Optimized  FIR  Filter  Using  FCSD Representation
Design of Optimized FIR Filter Using FCSD Representation
 
Design of all digital phase locked loop
Design of all digital phase locked loopDesign of all digital phase locked loop
Design of all digital phase locked loop
 
IRJET- Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
IRJET-  	  Implementation of Reversible Radix-2 FFT VLSI Architecture using P...IRJET-  	  Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
IRJET- Implementation of Reversible Radix-2 FFT VLSI Architecture using P...
 
Digital Implementation of Costas Loop with Carrier Recovery
Digital Implementation of Costas Loop with Carrier RecoveryDigital Implementation of Costas Loop with Carrier Recovery
Digital Implementation of Costas Loop with Carrier Recovery
 
Design of all digital phase locked loop (d pll) with fast acquisition time
Design of all digital phase locked loop (d pll) with fast acquisition timeDesign of all digital phase locked loop (d pll) with fast acquisition time
Design of all digital phase locked loop (d pll) with fast acquisition time
 
DESIGN AND IMPLEMENTATION OF COMBINED PIPELINING AND PARALLEL PROCESSING ARCH...
DESIGN AND IMPLEMENTATION OF COMBINED PIPELINING AND PARALLEL PROCESSING ARCH...DESIGN AND IMPLEMENTATION OF COMBINED PIPELINING AND PARALLEL PROCESSING ARCH...
DESIGN AND IMPLEMENTATION OF COMBINED PIPELINING AND PARALLEL PROCESSING ARCH...
 
DESIGN AND IMPLEMENTATION OF COMBINED PIPELINING AND PARALLEL PROCESSING ARCH...
DESIGN AND IMPLEMENTATION OF COMBINED PIPELINING AND PARALLEL PROCESSING ARCH...DESIGN AND IMPLEMENTATION OF COMBINED PIPELINING AND PARALLEL PROCESSING ARCH...
DESIGN AND IMPLEMENTATION OF COMBINED PIPELINING AND PARALLEL PROCESSING ARCH...
 
Transpose Form Fir Filter Design for Fixed and Reconfigurable Coefficients
Transpose Form Fir Filter Design for Fixed and Reconfigurable CoefficientsTranspose Form Fir Filter Design for Fixed and Reconfigurable Coefficients
Transpose Form Fir Filter Design for Fixed and Reconfigurable Coefficients
 
IRJET- Filter Design for Educational Set Via Labview Software Program
IRJET- Filter Design for Educational Set Via Labview Software ProgramIRJET- Filter Design for Educational Set Via Labview Software Program
IRJET- Filter Design for Educational Set Via Labview Software Program
 
Area, Delay and Power Comparison of Adder Topologies
Area, Delay and Power Comparison of Adder TopologiesArea, Delay and Power Comparison of Adder Topologies
Area, Delay and Power Comparison of Adder Topologies
 
Area Efficient 9/7 Wavelet Coefficient based 2-D DWT using Modified CSLA Tech...
Area Efficient 9/7 Wavelet Coefficient based 2-D DWT using Modified CSLA Tech...Area Efficient 9/7 Wavelet Coefficient based 2-D DWT using Modified CSLA Tech...
Area Efficient 9/7 Wavelet Coefficient based 2-D DWT using Modified CSLA Tech...
 

More from International Journal of Science and Research (IJSR)

More from International Journal of Science and Research (IJSR) (20)

Innovations in the Diagnosis and Treatment of Chronic Heart Failure
Innovations in the Diagnosis and Treatment of Chronic Heart FailureInnovations in the Diagnosis and Treatment of Chronic Heart Failure
Innovations in the Diagnosis and Treatment of Chronic Heart Failure
 
Design and implementation of carrier based sinusoidal pwm (bipolar) inverter
Design and implementation of carrier based sinusoidal pwm (bipolar) inverterDesign and implementation of carrier based sinusoidal pwm (bipolar) inverter
Design and implementation of carrier based sinusoidal pwm (bipolar) inverter
 
Polarization effect of antireflection coating for soi material system
Polarization effect of antireflection coating for soi material systemPolarization effect of antireflection coating for soi material system
Polarization effect of antireflection coating for soi material system
 
Image resolution enhancement via multi surface fitting
Image resolution enhancement via multi surface fittingImage resolution enhancement via multi surface fitting
Image resolution enhancement via multi surface fitting
 
Ad hoc networks technical issues on radio links security &amp; qo s
Ad hoc networks technical issues on radio links security &amp; qo sAd hoc networks technical issues on radio links security &amp; qo s
Ad hoc networks technical issues on radio links security &amp; qo s
 
Microstructure analysis of the carbon nano tubes aluminum composite with diff...
Microstructure analysis of the carbon nano tubes aluminum composite with diff...Microstructure analysis of the carbon nano tubes aluminum composite with diff...
Microstructure analysis of the carbon nano tubes aluminum composite with diff...
 
Improving the life of lm13 using stainless spray ii coating for engine applic...
Improving the life of lm13 using stainless spray ii coating for engine applic...Improving the life of lm13 using stainless spray ii coating for engine applic...
Improving the life of lm13 using stainless spray ii coating for engine applic...
 
An overview on development of aluminium metal matrix composites with hybrid r...
An overview on development of aluminium metal matrix composites with hybrid r...An overview on development of aluminium metal matrix composites with hybrid r...
An overview on development of aluminium metal matrix composites with hybrid r...
 
Pesticide mineralization in water using silver nanoparticles incorporated on ...
Pesticide mineralization in water using silver nanoparticles incorporated on ...Pesticide mineralization in water using silver nanoparticles incorporated on ...
Pesticide mineralization in water using silver nanoparticles incorporated on ...
 
Comparative study on computers operated by eyes and brain
Comparative study on computers operated by eyes and brainComparative study on computers operated by eyes and brain
Comparative study on computers operated by eyes and brain
 
T s eliot and the concept of literary tradition and the importance of allusions
T s eliot and the concept of literary tradition and the importance of allusionsT s eliot and the concept of literary tradition and the importance of allusions
T s eliot and the concept of literary tradition and the importance of allusions
 
Effect of select yogasanas and pranayama practices on selected physiological ...
Effect of select yogasanas and pranayama practices on selected physiological ...Effect of select yogasanas and pranayama practices on selected physiological ...
Effect of select yogasanas and pranayama practices on selected physiological ...
 
Grid computing for load balancing strategies
Grid computing for load balancing strategiesGrid computing for load balancing strategies
Grid computing for load balancing strategies
 
A new algorithm to improve the sharing of bandwidth
A new algorithm to improve the sharing of bandwidthA new algorithm to improve the sharing of bandwidth
A new algorithm to improve the sharing of bandwidth
 
Main physical causes of climate change and global warming a general overview
Main physical causes of climate change and global warming   a general overviewMain physical causes of climate change and global warming   a general overview
Main physical causes of climate change and global warming a general overview
 
Performance assessment of control loops
Performance assessment of control loopsPerformance assessment of control loops
Performance assessment of control loops
 
Capital market in bangladesh an overview
Capital market in bangladesh an overviewCapital market in bangladesh an overview
Capital market in bangladesh an overview
 
Faster and resourceful multi core web crawling
Faster and resourceful multi core web crawlingFaster and resourceful multi core web crawling
Faster and resourceful multi core web crawling
 
Extended fuzzy c means clustering algorithm in segmentation of noisy images
Extended fuzzy c means clustering algorithm in segmentation of noisy imagesExtended fuzzy c means clustering algorithm in segmentation of noisy images
Extended fuzzy c means clustering algorithm in segmentation of noisy images
 
Parallel generators of pseudo random numbers with control of calculation errors
Parallel generators of pseudo random numbers with control of calculation errorsParallel generators of pseudo random numbers with control of calculation errors
Parallel generators of pseudo random numbers with control of calculation errors
 

Sub1539

  • 1. International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438 Volume 4 Issue 1, January 2015 www.ijsr.net Licensed Under Creative Commons Attribution CC BY Design of High Speed VLSI Architecturefor 1-D Discrete Wavelet Transform Rashmi Patil1 , Dr. M. T. Kolte2 1 Research Student, Department of Electronics, B. D. C. O. E., Wardha, Maharashtra, India 2 Professor and H .O. D. of ENTC Department, India M. I. T. C. O. E., Pune, Maharashtra, India Abstract: The work presents an implementation of discrete wavelet transform (DWT) using systolic array architecture in VLSI. The architecture consists of filter unit, storage unit and control unit. This performs calculations of low pass and high pass coefficients by using only one multiplier. This architecture has been implemented and simulated using VLSI. The hardware utilization efficiency is more as compared to the referred due to the FBRA scheme. The systolic nature of this architecture corresponds to a clock speed of 19.27MHz forCoiflets1 wavelet as compared to other two wavelets. It has advantage for optimizing area and time. The architecture is modular and cascadable for one or multi-dimensional DWT. Keywords: DWT, FRA, hardware efficiency, six tap Fir Filter, Systolic Array 1. Introduction In the last decade, there has been an enormous increase in the applications of wavelets in various scientific disciplines to relate the discrete-time filter bank with the theory of continuous time function space. Typical applications of wavelets include signal processing, image processing, numerical analysis, statistics, biomedicine, etc.[1], [2]. Wavelet transform offers a wide variety of useful features, in contrast to other transforms, such as Fourier transform or cosine transform. Some of these are as follows: • Computational complexity of O (N), where N is the number of data samples [3] • Efficient VLSI implementation [4] • Low computational complexity, flexibility in representing non stationary image signals Redundancies in video sequence can be removed by using Discrete Cosine Transform (DCT). DCT suffers from the negative effects of blackness andmosquito noise resulting in poor subjective quality of reconstructed images at high compression [5].In order to meet the real time requirements in many applications,design and implementation of DWT is required [6], [7]. 2. Discrete Wavelet Transform DWT which is based onsub band coding,is fastcomputation wavelet transform.It is easy to implement and reduces the computation time and resources required.Wavelets can be realized by iteration of filters with scaling.The resolution of the signal,which is a measure of the amount of detail information in the signal,is determined by the filtering operations, and the scaleis determined by upsampling and down sampling (subsampling) operations [2]. A schematic of three stage DWT decomposition is shown in Figure1. Figure 1: Three Stage DWT Decomposition using Pyramid Algorithm In Figure 1, the signal is denoted by the sequence a[n],where n isan integer. The low pass filter is denoted by L1 while the high pass filter is denoted by H1. At eachlevel,the high pass filter produces detail information b[n], while the low passfilterassociated with scaling function produces coarse approximation,c[n]. Here the input signal a[n] has N samples.At the first decomposition level,the signal is passed through the high pass and low pass filters,followed by sub sampling by 2.The output of the high pass filter has N/2 samples and b[n].These N/2 samples constitute the first level of DWT coefficients.The output of the low pass filter also has N/2 samples and c[n]. The signal is then passed through low pass and high pass filters for furtherdecomposition. The output of the second low pass filter followed bysub sampling has N/4 samples and e[n].The output of the second high pass filter followed by subsampling has N/4 samples and d[n].The second high pass filterconstitutes the second level ofDWT coefficients.The low pass filter output is then filtered once again for further decomposition and produces g[n],f[n] with N/8 samples.The filtering and decimation process is continued until the desired levelis reached.The maximum number of levels depends on the length of the signal. Paper ID: SUB1539 42
  • 2. International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438 Volume 4 Issue 1, January 2015 www.ijsr.net Licensed Under Creative Commons Attribution CC BY 3. Data Dependencies within DWT The wavelet decomposition of a 1-D input signal for three stages is shown in Figure 1. The transfer functions of the sixth order high pass (g(n)) and low pass h(n)) filter can be expressed as follows: High(z) = go+g1z-1 +g2z-2 +g3z-3 +g4z-4 +g5z-5 (1a) Low(z) = ho+h1z-1 +h2z-2 +h3z-3 +h4z-4 +h5z-5 (1b) For clarity the intermediate and final DWT coefficientsin Figure 1 are denoted by a,b,c,d,e,f,and g.The DWT computations are complexbecause of the data dependencies at different octaves.Equations 2a-2n shows the relationship among a, b, c, d, e, f, and g. 1st octave: b(0) = g(0)a(0) + g(1)a(-1) + g(2)a(-2) + g(3)a(-3) + g(4)a(-4) + g(5)a(-5) (2a) b(2) = g(0)a(2) + g(1)a(1) + g(2)a(0) + g(3)a(-1) + g(4)a(-2) + g(5)a(- 3) (2b) b(4) = g(0)a(4) + g(1)a(3) + g(2)a(2) + g(3)a(1) + g(4)a(0) + g(5)a(-1) (2c) b(6) = g(0)a(6) + g(1)a(5) + g(2)a(4) + g(3)a(3) + g(4)a(2) + g(5)a(1) (2d) c(0) = h(0)a(0) + h(1)a(-1) + h(2)a(-2) + h(3)a(-3) + h(4)a(-4) + h(5)a(-5) (2e) c(2) = h(0)a(2) + h(1)a(1) + h(2)a(0) + h(3)a(-1) + h(4)a(-2) + h(5)a(- 3) (2f) c(4) = h(0)a(4) + h(1)a(3) + h(2)a(2) + h(3)a(1) + h(4)a(0) + h(5)a(-1) (2g) c(6) = h(0)a(6) + h(1)a(5) + h(2)a(4) + h(3)a(3) + h(4)a(2) + h(5)a(1) (2h) 2nd octave: d(0) = g(0)c(0) + g(1)c(-2) + g(2)c(-4) + g(3)c(-6) + g(4)c(-8) + g(5)c(-10) (2i) d(4) = g(0)c(4) + g(1)c(2) + g(2)c(0) + g(3)c(-2) + g(4)c(-4) + g(5)c(- 6) (2j) e(0) = h(0)c(0) + h(1)c(-2) + h(2)c(-4) + h(3)c(-6) + h(4)c(-8) + h(5)c(-10) (2k) e(4) = h(0)c(4) + h(1)c(2) + h(2)c(0) + h(3)c(-2) + h(4)c(-4) + h(5)c(- 6) (2l) 3rd octave: f(0) = g(0)e(0) + g(1)e(-4) + g(2)e(-8) + g(3)e(-12) + g(4)e(-16) + g(5)e(-20) (2m) g(0) = h(0)e(0) + h(1)e(-4) + h(2)e(-8) + h(3)e(-12) + h(4)e(-16) + h(5)e(-20) (2n) 4. Systolic Array (SA) Architecture for DWT The design of DWT-SA is based on computation schedule derived from Equations 2a-2n, which are the result of applying the pyramid algorithm for N=8 to the sixstage filter. As shown in Equations 2a-2n there are eight first octave coefficients computations, four second octave coefficients and two third octave computations The DWT-SA architecture comprisesof the following basic blocks: 4.1.The Filter Unit (FU) 4.2 The Storage Unit 4.3.The Control Unit (CU) 4.1. The Filter Unit (FU) Equations 1a-1b show that the high pass and low pass filter coefficients can be computed using a six stage multiply and accumulate digital filter, where partial results are computed at each stage separately and then progressively passed to the next higher stage for accumulation. Equations 1a-1b suggests that due to filter coefficient calculations spanning over 5 clock cycles, a temporary storage is required for the incoming data samples. The filter unit proposed for this architecture is a six-stage non-recursive FIR digitalfilter whose transfer functions for the high-pass andlow-pass components are shown in Equations 2a-2n.The systolic architecture of a six-stage filter is shown in Figure-2.The latency of each filter stage is 1. Figure 2: Systolic architecture of a six-stage filter The filter behaves in a systolic manner by computing partial products of more than one coefficient at a time. The first multiply and accumulate stage computes the first partial product and passes it to the second filter stage where it is added to the second partial product. That repetitive action continues until a complete DWT coefficient is out from the sixth filter stage. The filter consists of band select signal. If band selects is’1’ then it selects low pass coefficients. If band selects’0’then it selects high pass coefficients.The partial results are passed synchronously in a systolic manner from one cell to the adjacent cell. The filter cell consist therefore of only one multiplier, one adder and two registers to store high pass and low pass coefficients.The RTL schematic of filter unit is shown in Figure 3. In filter cell, a signed number multiplication problem occurs. The signed-number represents either positive, negative numbers or one positive and other negative numbers. To avoid this problem, the proposed filter cell consists of invert and xor operation as shown in Figure 4.The RTL schematic of filter cell is shown in Figure 5. Paper ID: SUB1539 43
  • 3. International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438 Volume 4 Issue 1, January 2015 www.ijsr.net Licensed Under Creative Commons Attribution CC BY Figure 3: The RTL schematic of filter unit S_D Dmul Sadd S_ Input S_ Hi_D S_L0_D Previous Input Cout Cin Mux 13 Multiplier Sm XOR S3 = Sa/s Sadd Mux 30 Comparator 30 Adder Invert Mux S0_Input Sm = S0 xor S Cout Substractor To the next stage Figure 4: The Proposed Filter Cell Figure 5: The RTL Schematic of Filter Cell 4.2 The Storage Unit Two storage units are used in the proposed architecture. • Input Delay Unit (ID) and • Register Bank (RB) Input Delay Unit (ID): The data registers and input delay used in theses storage units have been constructed from standard D-latch.Equations 2a-2n shows that the value of computed filter coefficient depends on the present as well as the five previous data samples. The negative time indexes in Equations 2 correspond to the reference starting time unit 0. Figure 6 shows the block diagram of the input delay unit. Delay 1 Delay 2 Delay 3 Delay 4 Delay 5 To Switch 1 To Switch 2 To Switch 3 To Switch 4 To Switch 5 To Switch 6 Input Figure 6: Input Delay Unit (ID) As shown in figure five delays are connected serially.At any clock cycle each delay passes its contents to its right neighbor which results in only five past values being retained.The inputof delayis applied to the switch.The RTL schematic of input delay unit isshown inFigure 7. Figure 7: The RTL Schematic of Input Delay Unit Register Bank (RB): In proposed architecture number of registers required is 26.Here, two 13 registers architecture are connected serially and constitute one register bank and are controlled by an overall clock.The output of one register is directly connected to theinput of the adjacent register,and thusthe register bank is implemented as shown in Figure 8and its RTL schematic is shown in Figure 9. Register 0 Register 1 Register 2 Register 25 Clock From filter bank To control unit Figure 8: Register Bank (RB) a(12:0)y(12:0) b(12:0) sel mux13 x(12:0)mulout(30:0) y(12:0) muls1224 in1(30:0) out1(29:0) U4 inv29 a(30:0)y(29:0) b(29:0) sel U3 mux30 x(30:0)sadd y(30:0) U5 xors30 Hi_D0(12:0) L0_D0(12:0) Bandselect Input(12:0) PREV(30:0) Output(30:0) a(30:0)y b(30:0) comp313 s addout(30:0)sadd cout x(29:0) y(30:0) adder311 ciny(30:0) d(30:0) coutsubt a(30:0)y(30:0) b(30:0) sel mux301 0014 1002 1002 1 1002 40000012 0009 40000012 3FFFFFED 40000012 00000012 3FFFFFED 0 40000012 0 40000042 1 0009 40000042 40000054 40000012 1 40000042 1 40000054 0 0 00000012 40000042 0 7FFFFFAC 40000054 40000054 40000054 7FFFFFAC 0 clk q(12:0) d(12:0) rst U1 latch13 clk q(12:0) d(12:0) rst U2 latch13 clk q(12:0) d(12:0) rst U3 latch13 clk q(12:0) d(12:0) rst U4 latch13 clk q(12:0) d(12:0) rst U5 latch13 clk q(12:0) d(12:0) rst U6 latch13 clk rst Output(12:0) Input(12:0) inout1(12:0) inout2(12:0) inout3(12:0) inout4(12:0) inout5(12:0) clko rst0 Paper ID: SUB1539 44
  • 4. International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438 Volume 4 Issue 1, January 2015 www.ijsr.net Licensed Under Creative Commons Attribution CC BY Figure 9:The RTL Schematic of Register Bank (RB) 4.3 Control Unit In DWT-SA architecture, schedule based on latency of 1 is proposed to meet the real time requirements in some applications.The computations are scheduled at the earliest possible clock cycle,and computed output samples are available one clock cycle after they have been scheduledas shown in Table 1.The delay is minimized through the pipeline facilitating real time operation.The computation schedule in table corresponds to a high hardware utilization of more than 85%.All the intermediate and the associated periods of activity are listed in Table 2. Table 1: Schedule for one complete set of computations Cycles High-pass Low-pass 1 b(0) c(0) 2 -- -- 3 b(2) c(2) 4 d(0) e(0) 5 b(4) c(4) 6 -- -- 7 b(6) c(6) 8 d(4) e(4) 9 b(0) c(0) 10 f(0) Low-pass Table 2: Activity periods for intermediate results Sample Available Cycles Life Period c(0) 1 1 to 12 c(2) 3 3 to 14 c(4) 5 5 to 16 c(6) 7 7 to 18 e(0) 12 12 to 38 e(4) 16 16 to 38 The complete design of the control unit for DWT-SA is shown in Figure 10.The control unit uses switch, decoder, and four FSM. The switching action is done by using FSM. State diagram is used to represent FSM.CU directs data from the Input Delay (ID) or the register bank (RB) to the filter unit (FU). The CU multiplexes data from the ID every second cycle, and from the RB in cycles 4, 6, and 8.In cycle2,6, CU remains idle, i.e.it does not allow any passage of data. Proper timing synchronization as well as enabling and disabling of the CU are insured by the CLK signal. The RTL schematic of control unit is shown in Figure 11. FSM 0 FSM 1 Clk Rst Rst Clk Rst Clk FSM 3 FSM 2 Decoder Y0 Y0 Y1 Y2 Y2 Y1 Y3 Y3 Clk Rst Clk Rst Output Figure 10: The Control Unit Control logic consists of four FSM. The switch is operated on this state diagram. According to that it accepts data from input delay and register bank. Figure 12 shows the state diagram for FSM0.Here, there are two states, S1 and S2.When reset is 1, it produces output 1.Otherwise it produces 0. Figure 11: The RTL Schematic of Control Unit s1 s2 y0<='0'; Rst ='1' y0<='1'; Figure 12: State Diagram for FSM0 Figure 13 shows the state diagram for FSM1. It consists of four states, S1, S2, S3, S4.When reset is 1, and the output is 0 for first clock. After that it goes to the next state, i.e. S2.At that time clock is incremented by 1.When clock cycle is less than 2,it produces output 0.But,when clock is greater than 2 then it goes to the next state. In S3 state, it produces output 1.This process is repeated for clock 3, 4, 5, 6, and 7. If clock is greater than 7 then it again comes to the S3 state and produces the output. clk q(30:0) d(30:0) rst clk q(30:0) d(30:0) rst clk rst clk q(30:0) d(30:0) rst clk q(30:0) d(30:0) rst clk q(30:0) d(30:0) rst clk q(30:0) d(30:0) rst clk q(30:0) d(30:0) rst clk q(30:0) d(30:0) rst clk q(30:0) d(30:0) rst clk q(30:0) d(30:0) rst clk q(30:0) d(30:0) rst clk q(30:0) d(30:0) rst clk q(30:0) d(30:0) rst Output1(30:0) Output2(30:0) Output3(30:0) Output4(30:0) Output5(30:0) Output6(30:0) Output7(30:0) Output8(30:0) Output9(30:0) Output10(30:0) Output11(30:0) Output12(30:0) Input(30:0) clko rsto Output(30:0) clk y0 rst U1 fsm clk y1 rst U2 fsm1 clk y2 rst U3 fsm2 clk y3 rst U4 fsm3 y0 y(3:0) y1 y2 y3 U5 cldecoder1 clk Rst Output(3:0) 0 0 0 0 0 0 0 1 0 0 0 0 0 2 0 1 0 0 0 2 Paper ID: SUB1539 45
  • 5. International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438 Volume 4 Issue 1, January 2015 www.ijsr.net Licensed Under Creative Commons Attribution CC BY Figure 13: State Diagram for FSM1 Figure 14 shows state diagram for FSM2.When reset is 0,state S2 produces output 0.This occurs when clock is less than 6.For the next clock i.e. Clock is greater than 6,the state S3 produce output 1.But when clock is greater than 7,then sate S4 changes its state to the previous state S3. Figure 14: State Diagram for FSM2 The next is FSM3, which consist of S1, S2, S3, and S4. Figure 15 shows the sate diagram for FSM3.When reset is 1, output y3 produces 0.In sate S2, when clock cycle is less than 8, then output is 0.But in the next clock cycle the output is high. If clock cycle is greater than 7, it changes the state S4 to S3 and produces output high. In all this FSM, outputs are connected to decoder. The output of decoder is connected to the select line of switch, by selecting particular select line; switching action of switch is done. And according to that switch, the data is applied to the filter cell. Figure 15: State Diagram for FSM3 5. Register Allocation In the DWT-SA architecture Control Unit (CU) and the Register Bank (RB) synchronize the availability of operands.There are two schemeswhich can be employed for this purpose, namely; • Forward Register Allocation (FRA) • The Forward-Backward Register Allocation(FBRA) The FRA method uses a set of registers which are allocated to intermediate data on the first comefirst serve basis. It does not reassign any registers to other operands once its contents have beenaccessed.The FBRA scheme is similar, except that once the operand stored has been used;there register is allocated to another operand.The FRA method is simpler,requires less control circuitry and permits easy adaptation of this architecture for coefficient calculation of more than 3 octaves.It results however in less efficientregister utilization.In our work we used FRA register allocation. 5.1 FRA Register Allocation To demonstrate the construction of the register allocation using the FRA approach,we will considerthe case of computing the coefficients c(0) and c(2).As shown in Table 3coefficient c(0) is computed in clock cycle one.Whereas coefficient c(2)is scheduled for computation in cycle 3.These two computed coefficients along with other four;c(4),c(6),c(8),c(10) are needed for thecomputation ofcoefficient(0).According to table e(0) computation is scheduled for cycle 4.However the six coefficients c(0),c(2), c(4),c(6),c(8),c(10) will not be available until cycle 12,and therefore the calculation of e(0) has to be scheduled for cycle 4+N,or 12.Similarly,coefficient e(4) is computed not in cycle 8,but 8+N,thus cycle16.The number of registers needed becomes apparent once one complete frame of computations has been scheduled. Table 3: FRA Register Allocation Cycle Com R1 R2 R3 R4 R5 R6 1 c(0) 2 c(0) 3 c(2) c(0) 4 e(0) c(2) c(0) 5 c(4) c(2) c(0) 6 g(0) c(4) c(2) c(0) 7 c(6) c(4) c(2) c(0) 8 e(4) c(6) c(4) c(2) 9 c(0) c(6) c(4) c(2) 10 c(0) c(6) c(4) 11 c(2) c(0) c(6) c(4) 12 e(0) c(2) c(0) c(6) 13 c(4) e(0) c(2) c(0) c(6) 14 g(0) c(4) e(0) c(2) c(0) 15 c(6) c(4) e(0) c(2) c(0) 16 e(4) c(6) c(4) e(0) c(2) 17 c(0) e(4) c(6) c(4) e(0) c(2) 18 c(0) e(4) c(6) c(4) e(0) 19 c(2) c(0) e(4) c(6) c(4) 20 e(0) c(2) c(0) e(4) c(6) 21 c(4) e(0) c(2) c(0) e(4) c(6) 22 g(0) c(4) e(0) c(2) c(0) e(4) Paper ID: SUB1539 46
  • 6. International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438 Volume 4 Issue 1, January 2015 www.ijsr.net Licensed Under Creative Commons Attribution CC BY 23 c(6) c(4) e(0) c(2) c(0) 24 e(4) c(6) c(4) e(0) c(2) 25 c(0) e(4) c(6) c(4) e(0) c(2) 26 c(0) e(4) c(6) c(4) e(0) 27 c(2) c(0) e(4) c(6) c(4) 28 e(0) c(2) c(0) e(4) c(6) 29 c(4) e(0) c(2) c(0) e(4) c(6) 30 g(0) c(4) e(0) c(2) c(0) e(4) 31 c(6) e(0) c(2) c(0) 32 e(4) c(6) e(0) c(2) 33 c(0) e(4) c(6) e(0) c(2) 34 e(4) c(6) e(0) 35 c(2) e(4) c(6) 36 e(0) c(2) e(4) c(6) 37 c(4) e(0) c(2) e(4) c(6) 38 g(0) c(4) e(0) c(2) e(0) e(4) Table 3: FRA Register Allocation (continued) Cycle R7 R8 R9 R10 R11 R12 R13 1 2 3 4 5 6 7 8 c(0) 9 c(0) 10 c(2) c(0) 11 c(2) c(0) 12 c(4) c(2) c(0) 13 c(4) c(2) c(0) 14 c(6) c(4) c(2) 15 c(6) c(4) c(2) 16 c(0) c(6) c(4) 17 c(0) c(6) c(4) 18 c(2) c(0) c(6) 19 e(0) c(2) c(0) c(6) 20 c(4) e(0) c(2) c(0) 21 c(4) e(0) c(2) c(0) 22 c(6) c(4) e(0) c(2) 23 e(4) c(6) c(4) e(0) c(2) 24 c(0) e(4) c(6) c(4) e(0) 25 c(0) e(4) c(6) c(4) e(0) 26 c(2) c(0) e(4) c(6) 27 e(0) c(2) c(0) e(4) c(6) 28 c(4) e(0) c(2) c(0) e(4) 29 c(4) e(0) c(2) c(0) 30 c(6) c(4) e(0) c(2) 31 e(4) c(6) c(4) e(0) c(2) 32 c(0) e(4) c(6) c(4) e(0) 33 c(0) e(4) c(6) e(0) 34 c(2) c(0) e(4) c(6) 35 e(0) c(2) c(0) e(4) 36 e(0) c(2) c(0) e(4) 37 e(0) c(2) e(4) 38 c(6) e(0) c(2) 6. DWT-SA Architecture The proposed architecture is shown in Figure 16 and its RTL schematic is shown inFigure 17.It is obtained by systematic analysis of the FRA register allocation in Table 3in conjunction with the filter equations 1a and 1a.Delayconsist of the DWT-SA architecture consists of the latency period necessary to fill up the filter for the first time, plus the delay through the registers asdescribed in Table 1.The output coefficients are obtained from the final filter stage. The proposed DWT-SA architecture uses only one filter. The architecture is optimized in hardware by using only a single multiplier and adder set in each filter celltogenerate all high pass and low pass coefficient.The coefficients which are present at the output of filter are stored in register. The control logic controls the switching action of switch.The data which are coming from delay input and register bank is controlled by switch.According to the position of switch,one of the data is select and performs filtering operations. This process is continued up to the 53 clock cycle. Figure 16: DWT-SA Architecture To select the particular output coming from filter unit, one of the small unit is used. When select line is zero, it selects one data. Force zero blocks is used to set the values zero. For controlling of this unit FSM is used. The select line is connected to the FSM. When reset is zero, it produces output. Otherwise it produces 0.This results in a simple and efficient systolic implementation for the computation of1- DWT and hence is suitable for VLSI implementation. Paper ID: SUB1539 47
  • 7. International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438 Volume 4 Issue 1, January 2015 www.ijsr.net Licensed Under Creative Commons Attribution CC BY Figure 17: RTL schematic of DWT-SA architecture The RTL view of DWT-SA for Daubechies3 is shown in Figure 18.This systolic array architecture is designed and implemented using quartus II by device: ‘Cyclone;EP2C20F484C7’.Here two elements are used to implement DWT-SA architecture.We get the same RTL view for Haar and Coiflets wavelets Figure 18: RTL view of DWT-SA for Daubechies3 wavelet 7. Simulation Results of DWT-SA Architecture for Daubechies3, HaarandCoiflets1Wavelet MATLAB is the software which is used in our project to verify the result with VHDL simulation. The high pass and low pass coefficients are calculated from MATLAB which are in decimal form. We convert these into binary and padding some zeroes in it. Because the filter cell and multiplexer is designed for31 bit, 13 bit respectively. Out of this13 bit, 12 bits are data bits and 1 MSB bit is signed bit. The signed bit represents the positive or negative number. All the implementation and simulation is done in Active HDL7.1. Simulated waveform gives result for different band select. When band select is ‘1’ the low pass coifficients are selected and gives Approximation coefficients. When band select is ‘0’ the high pass coifficients are selected and gives Details coefficients. The result of DWT-SA is in hexadecimal format. Table 4 shows low pass coefficients of daubechies3,haar and coiflets1 wavelet in hexadecimal form. Table 4: Low pass Coifficients of db3, haar and coiflets1 wavelet in Hex form Low pass coefficients db3 wavelet Haar wavelet coiflets1 wavelet LO_D0 0001 0012 0000 LO_D1 1002 0012 1002 LO_D2 1003 0 000A LO_D3 000B 0 0015 LO_D4 0014 0 0008 LO_D5 0008 0 1002 Table 5, Table 6 and Table 7 shows the result in terms of approximation coefficient in hexadecimal form of daubechies3,haar and coiflets1 wavelet. Table 5: Approximation Coefficients of db3wavelet When band select is ‘1’ Approximation coefficients of db3 First stage output Second stage output Third stage output ca1 ca11 ca111 -1 9 -54 7 -42 383 23 161 -6C9 24 51C 7AD8 1C E0 700 0 0 0 Table 6: Approximation Coefficients of Haar wavelet When band select is ‘1’ Approximation coefficients of Haar First stage output Second stage output Third stage output ca1 ca11 ca111 24 510 88B0 24 288 Table 7: Approximation Coefficients of coiflets1 wavelet When band select is ‘1’ Approximation coefficients of coiflets1 First stage output Second stage output Third stage output ca1 ca11 ca111 -2 4 -8 1D B2 -31A 23 531 8065 25 160 -5E 6 -C 18 0 0 0 Table 8 shows high pass coefficients ofdaubechies3,haar and coiflets1 wavelet in hexadecimal form. Table 8: High pass coefficients of db3, haar and coiflets1 wavelet in Hex form High pass coefficients db3 wavelet Haar wavelet coiflets1 wavelet HO_D0 1008 1012 0002 HO_D1 0014 0012 0008 HO_D2 100B 0 1015 HO_D3 1003 0 000A HO_D4 0002 0 0002 HO_D5 0001 0 0000 When band select is ‘0’ the high pass coifficients are selected and gives Details coefficients.Table 9, Table 10 and Table 11 shows the result in terms of detail coefficients in Bandselect Output(30:0) D0(12:0) D1(12:0) D2(12:0) PREV(30:0) U1 filtercell1 Lo_D0(12:0) Hi_D0(12:0) bandselect Prev(30:0) Bandselect Output(30:0) D0(12:0) D1(12:0) D2(12:0) PREV(30:0) U2 filtercell1 Lo_D1(12:0) Hi_D1(12:0) Bandselect Output(30:0) D0(12:0) D1(12:0) D2(12:0) PREV(30:0) U3 filtercell1 Lo_D2(12:0) Hi_D2(12:0) Bandselect Output(30:0) D0(12:0) D1(12:0) D2(12:0) PREV(30:0) U4 filtercell1 Lo_D3(12:0) Hi_D3(12:0) Bandselect Output(30:0) D0(12:0) D1(12:0) D2(12:0) PREV(30:0) U5 filtercell1 Lo_D4(12:0) Hi_D4(12:0) Bandselect Output(30:0) D0(12:0) D1(12:0) D2(12:0) PREV(30:0) U6 filtercell1 Lo_D5(12:0) Hi_D5(12:0) clk q(12:0)d(12:0) rst U7 latch13 clk q(12:0)d(12:0) rst U8 latch13 clk q(12:0)d(12:0) rst U9 latch13 clk q(12:0)d(12:0) rst U10 latch13 clk q(12:0)d(12:0) rst U11 latch13 rst clk clk Output(30:0)filterin(30:0)R0(30:0) rst R1(30:0) R10(30:0) R12(30:0) R2(30:0) R3(30:0) R4(30:0) R5(30:0) R6(30:0) R7(30:0) R8(30:0) R9(30:0) U12 regbank1 clk Output(30:0)filterin(30:0) R0(30:0) rst R1(30:0) R10(30:0) R12(30:0) R2(30:0) R3(30:0) R4(30:0) R5(30:0) R6(30:0) R7(30:0) R8(30:0) R9(30:0) U13 regbank1 a(12:0) y(12:0) b(30:0) c(30:0) d(30:0) sel(3:0) Rst Output(3:0) clk U19 dwtcl2 a(12:0) y(12:0) b(30:0) c(30:0) d(30:0) sel(3:0) U23 sw3 U14 sw3 a(12:0) y(12:0) b(30:0) c(30:0) d(30:0) sel(3:0) U15 sw3 a(12:0) y(12:0) b(30:0) c(30:0) d(30:0) sel(3:0) U16 sw3 a(12:0) y(12:0) b(30:0) c(30:0) d(30:0) sel(3:0) U17 sw3 a(12:0) y(12:0)b(30:0) c(30:0) d(30:0) sel(3:0) U18 sw3 clk datain(12:0) rstin rstout U20 fsmforin1 a(30:0)y(30:0) sel U24 forcezero1 clkfzc rst U25 forcezerofsm1 outsq(30:0) out0(30:0) sel(3:0) out1(30:0) out2(30:0) U21 demux301 stage0(30:0) stage1(30:0) stage2(30:0) 1 00000000 0000 00011008 00000000 00011008 1 00000000 1 00000000 0000 10020014 00000000 10020014 1 00000000 0000 1003100B 00000000 1003100B 1 00000000 0000 000B1003 00000000 000B1003 1 00000000 0000 00140002 00000000 00140002 1 00000000 0000 00080001 00000000 00080001 0 0001 0000 0 0 00010001 0 0 00010001 0 0 00010001 0 0 00010001 0 0 0 0 00000000 00000000 00000023 0 00000009 00000000 00000000 00000007 00000000 40000001 00000000 00000000 00000000 00000000 00000000 0 00000000 00000000 00000000 0 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0001 0000 000000070000000700000000 4 0 4 0 0000 0000 00000023 00000023 00000009 4 0001 0000 400000014000000100000000 4 0001 0000 000000000000000000000000 4 0001 0000 000000000000000000000000 4 0001 0000 000000000000000000000000 4 0 0000 0 0 00000000 00000000 0 0 0 0 00000000 00000000 4 00000000 00000000 00000000 00000000 00000000 Paper ID: SUB1539 48
  • 8. International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438 Volume 4 Issue 1, January 2015 www.ijsr.net Licensed Under Creative Commons Attribution CC BY hexadecimal form of daubechies3, haar and coiflets1 wavelet in hexadecimal form. Table 9: Details coefficients of db3 wavelet When band select is ‘0’ Details coefficients of db3 First stage output Second stage output Third stage output cd1 cd11 cd111 C 100 1110 -2 05E 086E 1 0BA 114 -B -1E 75 3 3 3 0 0 0 Table 10: Details coefficients of Haar wavelet When band select is ‘0’ Details coefficients of Haar First stage output Second stage output Third stage output cd1 cd11 cd111 0 0 0 0 0 0 Table-11:Details coefficients of coiflets1 wavelet When band select is ‘0’ Details coefficients of coiflets1 First stage output Second stage output Third stage output cd1 cd11 cd111 -A 4E 34E -1 6F 9D 1 D5 906 -9 2 4 2 0 0 0 0 0 The Systolic array architecture of DWT is synthesized, placed and routed for cyclone device EP2C20F484C7. The levels of logic cells reported are 8% for Daubechies3, 5% for Haar, 2% for coiflets1. The minimum clock period reported by the synthesis tool after placing and routing is given in Table 12. The Area Analysis reports of each wavelet for Discrete Wavelet Transform Systolic array architecture implementation are given in Table 13. Table 12: Time Analysis for device EP2C20F484C7 Parameters Daubechies3 haar Coiflet1 Clock Period(ns) 53.508 ns 53.743 ns 51.892 ns Clock Speed(MHz) 18.69 MHz 18.61 MHz 19.27 MHz Table 13: Area Analysis for device EP2C20F484C7 Parameters Daubechies3 haar Coiflet1 Total Logic Elements 1515, 8% 955,5% 1407,2% Total Combinational Functions 1320, 7% 907,5% 1212,6% Dedicated Logic Registers 336, 2% 134,1% 336, 2% LUT-only LCs 1179(1) 821(0) 1071(1) Register-Only LCs 195(0) 48(0) 195(0) LUT/Register LCs 141(0) 86(0) 141(0) For the verification of VLSI results, MATLAB is used. The approximation and detail coefficient are obtained from MATLAB.Figure 19shows the simulation waveform for approximation coefficients and Figure 20shows the simulation waveform for details coefficients for Daubechies3 wavelet. Figure 19: Simulation result of Approximation coefficients for Daubechies3 Figure 20: Simulation result of Details coefficients for Daubechies3 Figure 21 and Figure 22 shows the simulation waveform for approximation coefficients and details coefficients of Haar wavelet. Paper ID: SUB1539 49
  • 9. International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438 Volume 4 Issue 1, January 2015 www.ijsr.net Licensed Under Creative Commons Attribution CC BY Figure 21: Simulation result of Approximation coefficients for Haar Also, Figure 23 and Figure 24shows the simulation waveform for approximation coefficients and details coefficients of Coiflets1 wavelet. Figure 22: Simulation result of Details coefficients for Haar Figure 23: Simulation result of Approximation coefficients for Coiflets1 Figure 24: Simulation result of Details coefficients for Coiflets1 8. Conclusion The proposed DWT-SA architecture computes N coefficients in N clock cycles and achieves real time operation by executing computations of higher octave coefficients in between the first octave coefficient computations. The first octave computations are scheduled for every N/4cycles, while the second and third octaves are scheduled every N/2 and N clock cycles respectively. The main objective of this work is to develop a design resource for designers to implement systolic array architecture of DWT for various wavelets according to the design need such as clock speed, area and time. Systolic array architecture has efficient hardware utilization and it works with data streams of arbitrary size. The design is cascadable, for computation of one, two and three decomposition level. Systolic array Architecture is implemented using Very High Speed Integrated Circuit Paper ID: SUB1539 50
  • 10. International Journal of Science and Research (IJSR) ISSN (Online): 2319-7064 Index Copernicus Value (2013): 6.14 | Impact Factor (2013): 4.438 Volume 4 Issue 1, January 2015 www.ijsr.net Licensed Under Creative Commons Attribution CC BY (VHSIC) Hardware Description Language (VHDL) and then are simulate using Active HDL. References [1] N. Jayant, "High Quality Networking of Audio-Visual Information", IEEE Communications Magazine, pp. 84-95, September 1993. [2] E. A. Fox, "Advances in Interactive Digital Multimedia Systems", IEEE Computer Magazine, pp. 9-21, October 1991. [3] P. H. Ang, P. A. Ruetz and D. Auld, "Video Compression Makes Big Gains", IEEE Spectrum Magazine, pp. 16-19, October 1991. [4] A. K. Jain, "Image Data Compression: A Review", Proceedings of the IEEE, Vol. 69, No. 3, pp. 349- 389, March 1981. [5] K. R. Rao and P. Yip, Discrete Cosine Transform - Algorithms, Advantages, Applications. Academic Press: San Diego, 1990. [6] H. G. Musmann, P. Pirsch and H. J. Grailerr, "Advances in Picture Coding", Proceedings of the 1EEE, Vol. 73, No. 4, pp. 523-548, April 1985. [7] A. bl. Netravali and B. G. Haskell, Digital Pictures, Plenum: New York, 1989. [8] FBI Systems Technology Unit: "Gray Scale Fingerprint image Compression Status Report, Nov. 4 1991. [9] Chao-Tsung Huang, Po-Chih Tseng, and Liang-Gee Chen,” Analysis and VLSI Architecture for 1-D and 2- D Discrete Wavelet Transform”, IEEE Transactions on signal processing, vol. 53, No. 4, April 2005 [10]Chih-Chi Cheng, Chao-Tsung Huang, Ching-Yeh Chen, Chung-JrLian, and Liang-Gee Chen,” On-Chip Memory Optimization Scheme for VLSI Implementation of Line-Based Two-Dimentional Discrete Wavelet Transform”, IEEE Transactions on circuits and systems for video technology, vol. 17, no. 7, July 2007 [11]XinTian, Lin Wu, Yi-Hua Tan, and Jin-Wen Tian,” Efficient Multi-Input/Multi-Output VLSI Architecture for Two-Dimensional Lifting-Based Discrete Wavelet Transform”, IEEE transactions on computers, vol. 60, no. 8, August 2011 [12]Sze-Wei Lee, Soon-Chieh Lim,” VLSI Design of a Wavelet Processing Core”, IEEE transactions on circuits and systems for video technology, vol. 16, no. 11, November 2006 [13]Chao Cheng, Keshab K. Parhi,” High-Speed VLSI Implementation of 2-D Discrete Wavelet Transform”, IEEE transactions on signal processing, vol. 56, no. 1, January 2008 Author Profile Ms. RashmiPatil received her B. Eng. Degree in Electronics & Communication from S.S.G.M.C.E. Shegaon, India in 2010 and M.Tech degree in Electronics from R.T.M.N.U. Nagpur, India. Currently she is a research scholar in B.D.C.O.E., Sevagram, India. Her area of interest is applied electronics, VLSI, VHDL, and Low Power optimization. Dr. M. T. Kolte has completed his B.Tech, M.E., and Ph.D in Electronics and Telecommunication. He is working as Head of Dept.in M.I.T.C.O.E., Pune, India. He has presented and published many papers in National and International Conference. Paper ID: SUB1539 51