SlideShare a Scribd company logo
1 of 14
Download to read offline
120 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 61, NO. 1, JANUARY 2014
Memory Footprint Reduction for Power-Efficient
Realization of 2-D Finite Impulse Response Filters
Basant K. Mohanty, Senior Member, IEEE, Pramod K. Meher, Senior Member, IEEE,
Somaya Al-Maadeed, Senior Member, IEEE, and Abbes Amira, Senior Member, IEEE
Abstract—We have analyzed memory footprint and combina-
tional complexity to arrive at a systematic design strategy to derive
area-delay-power-efficient architectures for two-dimensional (2-D)
finite impulse response (FIR) filter. We have presented novel block-
based structures for separable and non-separable filters with less
memory footprint by memory sharing and memory-reuse along
with appropriate scheduling of computations and design of storage
architecture. The proposed structures involve times less storage
per output (SPO), and nearly times less energy consumption per
output (EPO) compared with the existing structures, where is the
input block-size. They involve times more arithmetic resources
than the best of the corresponding existing structures, and produce
times more throughput with less memory band-width (MBW)
than others. We have also proposed separate generic structures for
separable and non-separable filter-banks, and a unified structure
of filter-bank constituting symmetric and general filters. The pro-
posed unified structure for 6 parallel filters involves nearly
times more multipliers, times more adders, less
registers than similar existing unified structure, and computes
times more filter outputs per cycle with times less MBW than
the existing design, where is FIR filter size in each dimension.
ASIC synthesis result shows that for filter size (4 4), input-block
size , and image-size (512 512), proposed block-based
non-separable and generic non-separable structures, respectively,
involve 5.95 times and 11.25 times less area-delay-product (ADP),
and 5.81 times and 15.63 times less EPO than the corresponding
existing structures. The proposed unified structure involves 4.64
times less ADP and 9.78 times less EPO than the corresponding
existing structure.
Index Terms—Block processing, 2-dimensional (2-D) finite im-
pulse response (FIR), digital filters, VLSI architecture.
I. INTRODUCTION
TWO-DIMENSIONAL (2-D) digital filters are frequently
used in 2-D signal processing as well as the image and
video processing applications such as image enhancement,
Manuscript received December 09, 2012; revised February 16, 2013;
accepted March 31, 2013. Date of publication June 17, 2013; date of current
version January 06, 2014. This work was supported by an internal grant from
Qatar University (QUUG-CENG-DCS-11/12-7). The statements made herein
are solely the responsibility of the authors. This paper was recommended by
Associate Editor F. Clermidy.
B. K. Mohanty with the Department of Electronics and Communication Engi-
neering, Jaypee University of Engineering and Technology, Raghogarh, Guna,
Madhy Pradesh 473226, India (e-mail: bk.mohanti@juet.ac.in).
P. K. Meher is with Institute for Infocomm Research, 138632 Singapore
(e-mail: pkmeher@i2r.a-star.edu.sg).
S. Al-Maadeed with Department of Computer Science and Engineering,
Qatar University, Doha, Qatar (e-mail: s_alali@qu.edu.qa).
A. Amira with School of Computing, University of West Scotland, Paisley,
Scotland, UK, (e-mail: abbes.amira@uws.ac.uk).
Digital Object Identifier 10.1109/TCSI.2013.2265953
image restoration [1], template matching [2], face recognition,
feature extraction for bio-metric systems [3]–[5], and video
communication etc. The 2-D FIR filters are more popularly
used compared to its infinite impulse response (IIR) counterpart
due to their numerical stability and simplicity of design. The
system function of 2-D FIR filter is often non-separable, while
in a few cases, it is separable when impulse response
is expressed as . The non-separable
and separable system functions are, respectively, given as:
(1)
(2)
where is the impulse response matrix of the non-sepa-
rable 2-D FIR filter of size while and
are the impulse responses of 1-D FIR filters used for row-wise
and column-wise processing of 2-D input.
Block diagrams of conventional realization of non-separable
and separable 2-D FIR filters are shown in Fig. 1. As shown
in this figure, both these filter structures consist of two types of
hardware components: (i) the combinational component and (ii)
the memory or storage component. The combinational compo-
nent consists mainly of the arithmetic circuits along with some
steering logic like multiplexors and demultiplexers, while the
storage component consists of transposition buffers and/or shift-
registers to provide appropriate data to combinational units. The
non-separable structure uses shift-registers to introduce the nec-
essary row-delays for the processing of intermediate data while
the separable structure uses shift-registers for transposition of
intermediate data. We can find from (1) that, a non-separable
2-D FIR filter of size involves shift-regis-
ters (SRs) of size each, registers (for row-column
processing), and multipliers and adders to compute one
filter output per cycle. Similarly, we can find from (2) that, the
separable 2-D filter of size involves SRs of
words each, registers, multipliers and adders to
compute one filter output per cycle. Combinational and memory
(register) complexities of full-parallel non-separable and sepa-
rable filters are given in Table I. Since, the image size is
higher than the filter-size by more than an order of mag-
nitude in most of the image-processing applications, memory
becomes the dominant component of hardware complexity of
2-D FIR structures, and consumes major amount of chip-area
and total power consumption.
1549-8328 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
MOHANTY et al.: MEMORY FOOTPRINT REDUCTION FOR POWER-EFFICIENT REALIZATION 121
Fig. 1. Conventional structure of 2-D FIR filter. (a) Non-separable method, (b)
Separable method.
TABLE I
COMBINATIONAL AND MEMORY COMPLEXITY OF FULL-PARALLEL SEPARABLE
AND NON-SEPARABLE 2-D FIR FILTERS
Some systolic architectures have been suggested for VLSI
implementation of 2-D FIR filters to achieve high-throughput
and low-latency implementation [6]–[11]. Recently, some effi-
cient structures have been proposed for 2-D IIR filters [13]–[17]
and shown that non-separable 2-D FIR filters can also be real-
ized efficiently using those structures. In all these existing de-
signs, systolization [18] of the structure is considered as the
major issue, and a substantially large number of delay elements
are placed in the data-path to avoid global communication. Sim-
ilarly, the symmetry of impulse response matrix has been ex-
ploited to reduce the hardware and time complexities of the
structures [12]–[17]. In the last four decades, several design ap-
proaches have been suggested for reducing the arithmetic com-
plexity of one-dimensional (1-D) FIR filters [19]–[28]. All those
methods are now quite mature, and can be applied to reduce the
complexity of 1-D filters, which could be used for the imple-
mentation of 2-D FIR filters as well. On the other hand, memory
complexity constitutes the dominant part of overall area com-
plexity of these structures, and plays a significant role in power-
efficient realization of 2-D FIR filters. Keeping that in view, in
this paper, we present memory-centric designs of 2-D FIR filter
for the reduction of total memory usage, memory-reuse, reduc-
tion of memory band-width and memory-sharing, which could
lead to an area-delay-power-efficient structures.
Many image processing applications use 2-D filter banks
comprised of both separable and non-separable filters. One
such application is biometric system where Gabor filter bank is
used for feature extraction for face recognition and finger-print
matching [3]–[5]. The Gabor filter bank is generated for
different center frequencies and orientations. Consequently,
constituent filters are both separable and non-separable types
with symmetry as well as without symmetry. Recently, a
unified structure has been proposed in [17] for the realization
of filters with diagonal, four-fold rotational, quadrant and
octal symmetries. However, this structure does not favour the
realization a filter-bank since only one filter can be realized at
a time. Moreover, separable filters can not be realized using
this structure. The main objective of unified structure of [17] is
to achieve saving of arithmetic resources, but in this paper, we
aim at presenting a unified memory-efficient structure for filter
bank with separable and non-separable 2-D FIR filters.
The main contributions of this paper are:
• Analysis of memory footprint and exploration of possible
storage optimization to have a memory-efficient design
strategy for the implementation of 2-D FIR filter.
• Shared memory design for separable and non-separable
structures.
• Scheduling of computations and architecture design for
memory-reuse and memory band-width reduction in sepa-
rable filters.
• Unified structure of filter bank comprised of separable
and non-separable filters with low memory footprint per
output.
The rest of the paper is organized as follows: The proposed
design strategy is discussed in Section II and block formulation
of 2-D FIR filter is given in Section III. Proposed structures are
presented in Section IV for separable and non-separable filters.
Generic structure of filter-bank consisting of different types of
filters is presented in Section V. Hardware complexity and per-
formance of the proposed structures are discussed in Section VI.
Conclusion is presented in Section VII.
II. PROPOSED DESIGN STRATEGY
To arrive at the proposed design strategy we analyze here
memory complexities of possible configurations of 2-D FIR
filter. The system function of non-separable 2-D FIR filter
(given by (1)) can be written in a split form as:
(3a)
(3b)
Computations of (3a) and (3b) can be performed by a di-
rect-form or a transposed-form structure to have four possible
configurations such as fully-direct (direct-direct), fully-trans-
posed (transpose-transpose), hybrid-1 (direct-transpose), and
hybrid-2 (transpose-direct), for the realization of non-sepa-
rable as shown in Fig. 2 for . All these four
structures require the same number of arithmetic components
(multiplier and adders) and delay elements and cor-
responding to shift-registers and fixed registers, respectively)
except their locations in the data-path. Since, the bit-widths
of arithmetic units, buses and registers in the data-path are
different for input signals and intermediate signals, the overall
memory requirements of different configurations are different
in terms of number of storage bits, although all of them have the
same number of delay elements1. We have estimated memory
complexity of all the four type of structures for an input image
size 512 512 (i.e ) and filter length and
), and listed in Table II for comparison. We find that,
fully-direct structure has the lowest memory requirement than
others. Interestingly, the memory complexity of fully-direct
1The input image to be processed by the 2-D FIR filter consists of 8-bit pixels
while higher bit-width is required for intermediate signals.
122 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 61, NO. 1, JANUARY 2014
Fig. 2. Four different configurations for realization of 2-D FIR filter for . (a) Fully-direct structure, (b) Hybrid-1 structure (c) Hybrid-2 structure, and (d)
Fully-transposed structure, where represents a shift-register of words and represents a single register.
TABLE II
MEMORY COMPLEXITY FULLY-DIRECT, FULLY-TRANSPOSE, HYBRID-1 AND HYBRID-2 STRUCTURES. : FILTER-SIZE, : INPUT IMAGE-SIZE, : INPUT SIGNAL
BIT-WIDTH AND : INTERMEDIATE SIGNAL BIT-WIDTH
structure is independent of word-length of intermediate signals
since all the delay elements of this structure are placed on the
input path only. This is a very useful feature to be exploited for
memory footprint reduction in 2-D FIR filter structure.
A. Exploration of Memory-Reuse Possibilities
To explore the memory reuse possibilities, let us consider
the input data-flow of fully-direct structure for the computa-
tion of -th row of outputs
as shown in Fig. 3 for . The samples
required to compute a given filter output is shown in a pair of
curly braces and the corresponding filter output is shown at its
right. Each arrow shows the source (shift-register/register) of
samples. To compute each output of (4 4) filter, 16 input-sam-
ples corresponding to 4 rows and 4 columns of 2-D input are
required. Out of 4 rows (numbered as ),
the -th row is the current input-row and others are immediate
past rows. Memory-unit uses three shift-registers (SR-1, SR-2,
SR-3) to buffer 3 past -th rows of input
MOHANTY et al.: MEMORY FOOTPRINT REDUCTION FOR POWER-EFFICIENT REALIZATION 123
Fig. 3. Memory-unit data-flow of fully-direct structure for outputs , and . Redundant memory-values are shown
by blue color boxes.
samples. The required -th columns of sam-
ples of a particular-row are provided by the serial-in parallel-out
(SIPO) shift-register of words. As shown in Fig. 3, all
the 16 input-samples required to compute the filter output are
obtained from the memory-unit and current input. Therefore,
memory used by fully-direct structure to compute each output is
words, where SRs-unit provides words and fixed
register-unit provides 12 words. Memory band-width (MBW) of
the structure is 15 (read operations on SR-unit is 3, and 12 values
are obtained from register-unit to compute an output). This can
be generalized to find the number of memory words used by
fully-direct structure to be and MBW to be
.
As shown in Fig. 3, in total 64 input samples are
required to compute the outputs
. These 64 samples belong
to 4 rows and 7 columns
of the input
image. It can be noticed that out of 64 samples 28 sam-
ples are different from each other, while redundancy exists
in rest 36 samples. The redundant values corresponding to
filter-output
in the blue boxes in Fig. 3. These redundant-memory access
could be avoided by parallel computation of filter outputs
. In that case,
four current input samples
corresponding to four outputs need to be
available in each cycle. Four input values of each of the past
rows are retrieved from respective
shift-registers in every cycle. To realize this, each of the
shift-registers (of words) is required to be split into four
equal parts of words. Therefore, 12 input values are
obtained from the SR-unit in every cycle. Out of 28 unique
samples, 4 are obtained as input samples of current cycle and
12 samples are obtained from the SR-unit. The remaining
12 samples are obtained from four SIPO shift-registers. The
memory usage for parallel computation of 4 filter outputs is
which is the same as the memory usage
of the fully-direct structure for one output. MBW for parallel
computation of four filter outputs is . The
memory words required to compute a block of filter outputs
per cycle of the 2-D FIR filter of size can therefore
found to be and MBW can found to be
.
Interestingly, the storage space of fully-direct structure of
2-D FIR filter of size is the same as parallel compu-
tation of filter outputs due to memory reuse during parallel
computation. Only arithmetic resources increases proportion-
ately with throughput rate. This is an important feature which
can be utilized for area and power saving. The memory reuse
efficiency (MRE) of block-based structure can be estimated
as MRE [(total input words—actual memory usage)/ac-
tual memory usage], where total input words is times the
memory-usage of single-input single-output (SISO) structure.
For block-based structure with block size
.
Therefore, MRE of block-based parallel structure increases
proportionately with the block-size . Higher the MRE,
lesser is the memory requirement per output. Consequently,
the structure is more area-delay efficient compared with the
SISO structure. MBW of block-based structure increases by a
factor of instead of times, where is the
input block-size. This is mainly due to the number of redundant
samples . Higher the , better is the
MBW reduction. We have estimated the memory band-width
saving (MBS) using the formula MBS of
SISO structure) MBW of block-based structure of block-size
MBW of SISO structure). Therefore, we can have
. Since, varies with and , MBS
is higher for larger size filters with higher input block-size.
We have estimated , MBS and MRE of parallel 2-D FIR
structure for different filter sizes and input block-sizes .
The estimated values are listed in Table III to quantify the scope
of memory reuse. We can find from Table III that a block-based
realization enhances memory-reuse and reduces memory-band
width per output. Therefore, we have presented the block for-
mulation and its subsequent implementation of 2-D FIR filters.
B. Memory-Sharing in Generalized 2-D FIR Filter Structures
The fully-direct non-separable structure as well as the con-
ventional separable structure use shift-registers to store
words each, but shift-registers of fully-direct non-separable
structure stores input pixel values while those of separable
structure stores intermediate values. Due to the difference in
bit-width, a common shift-register unit cannot be shared by
these two structures. A different design approach for separable
filter is required where the shift-register unit stores the input
124 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 61, NO. 1, JANUARY 2014
TABLE III
MEMORY REUSE EFFICIENCY AND MEMORY BAND-WIDTH SAVING FOR
DIFFERENT SIZE FILTERS AND INPUT-BLOCK SIZE
signal only. For this purpose, we have used an efficient decom-
position scheme for 2-D separable filter in the following.
The input-output relation of separable 2-D FIR filter, given
by (2), can be written as:
(4)
Computation of (4) can be expressed in split form as:
(5a)
(5b)
and can be expressed as inner-products of a pair of -point
vectors and as
(6a)
(6b)
and and are given by
(7a)
(7b)
(7c)
(7d)
According to (5), input-vectors are fed to the row-filter in
column overlapped order, and the row-filter generates inter-
mediate values column-wise exactly in the same order as the
Fig. 4. Separable 2-D FIR filter structure (for based on proposed
decomposition scheme.
column-filter consumes intermediate values. Consequently,
transposition-unit in this case is comprised of fixed registers
instead of shift-registers.
Separable filter structure based on decomposition scheme of
(5) is shown in Fig. 4 for . The input samples are fed
as overlapped blocks for and
, where successive blocks of a column are over-
lapped by samples. The input-blocks are fed in row-serial
order from shift-register unit comprising of shift-reg-
isters of words each. We can find from Figs. 3 and 4 that
shift-register unit of fully-direct non-separable structure and the
separable structure based on this decomposition scheme has
the same number of shift-registers and they are of the same
size. Interestingly, the data-input and data-output formats of
both these shift-register units are identical. This favours the pos-
sible sharing of shift-register units of fully-direct non-separable
structure and separable structure. This leads to a generalized
structure for both non-separable and separable filters individ-
ually or in parallel configuration.
Data-flow of a shared shift-register unit is shown in Fig. 5
for , where the input-data requirement of both fully-di-
rect non-separable and separable structures is taken care of by
the shared shift-register unit. A shared shift-register unit not
only offers memory-saving, but also allows parallel realization
of both non-separable and separable filters. It is shown in the
later Sections that the parallel implementation of generalized
structure offers higher area-delay-power efficiency over sequen-
tial structure. Keeping these facts in view, we have outlined
here a systematic memory-centric design strategy to derive an
area-delay-power efficient structure for 2-D FIR filter.
• A fully-direct form structure should be used for non-sepa-
rable filter to have less storage-complexity.
• A block implementation of fully-direct structure should be
used for MBW reduction.
• Separable structure based on the proposed decomposition
algorithm could be derived for shift-register sharing with
non-separable filter structure.
• Appropriate algorithm partitioning and scheduling scheme
need to be used for separable filter to minimize memory
bandwidth and increase register sharing.
The register sharing property of proposed non-separable and
separable filter structures are exploited further to derive generic
structures. The proposed generic structures can be configured
MOHANTY et al.: MEMORY FOOTPRINT REDUCTION FOR POWER-EFFICIENT REALIZATION 125
Fig. 5. Data-flow in a shared shift-register unit of fully-direct structure and
separable structure.
for realization of a single filter of types (separable, non-sepa-
rable, symmetry (diagonal, four-fold rotational, quadrant)) or
parallel realization of any combination of these filters.
III. BLOCK FORMULATION OF 2-D FIR FILTERS
A. For Non-Separable Filter
Let us consider a non-separable filter which processes a block
of input samples and generates a block of outputs in every
cycle. The -th block of filter output of the -th row is
computed by relation;
(8)
where and are defined as
(9a)
(9b)
The intermediate vector is computed by product of input-
matrix and impulse-response vector , and given by
(10)
The input matrix is derived from -th row of the
input matrix of size and given by (11), shown at
the bottom of the page. is given by:
(12a)
From (9) and (11), we find is the inner-product
of -th row of matrix and , given by
(13)
B. For Separable 2-D FIR Filter
Let us consider a separable filter which processes a block of
input samples and generates a block of outputs in every
cycle. The -th block of filter output of the -th row is
computed in this case by two successive matrix-vector products
given by
(14a)
(14b)
and the input-matrix and intermediate-matrix are given
by (15) and (16), shown at the bottom of the page.
(11)
(15)
126 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 61, NO. 1, JANUARY 2014
Fig. 6. Proposed block-based structure for non-separable 2-D FIR filter for
block-size and filter-size .
IV. PROPOSED STRUCTURES
In this Section, we have derived two separate structures for
block implementation of non-separable and separable 2-D FIR
filters.
A. Block-Based Structure for Non-Separable 2-D FIR Filter
The computation of (8) and (10) are mapped into a fully-
direct parallel structure to derive the proposed block-based
structure for non-separable 2-D FIR filter. The proposed struc-
ture is shown in Fig. 6 for filter-size and block-size
. It consists of one memory-module and one arithmetic
module.
1) Memory-Module Design: The memory-module of Fig. 6
is comprised of one shift-register array and input-reg-
ister units (IRUs). The shift-register array further consists
of shift-registers of words each, where
. Proposed structure receives a block of input
samples and computes a block of outputs in each cycle.
All the samples of each input-block belong to the same
row and the inputs are fed to the structure block-by-block
and then row-by-row in serial order. The input-block
corresponding to the -th row of input image is fed
to the structure during -th cycle of -th set of
cycles, and the entire image is fed in cycles for
. shift-registers
of the shift-register array are arranged in groups referred to as
SR-block and each SR-block has shift-registers. As shown
in Fig. 6, SR-1,SR-2,SR-3,SR-4 constitute SR-block-1 and
SR-25,SR-26,SR-27,SR-28 constitute SR-block-7. One
SR-block stores one input row and the shift-register array
Fig. 7. Internal structure of -th input register unit (IRU) for and
.
stores rows of input. SRs of SR-block are connected
such that a block of samples transfer from one SR-block to
the adjacent SR-block to its right after every cycle. Therefore,
a block of inputs of a particular row are obtained
from each SR-block in every cycle, and input-blocks
corresponding to consecutive input rows are obtained
from the shift-register unit in every cycle.
Current input-block and past input-blocks avail-
able in the shift-register array are sent to IRUs to generate
input-matrix of size . During -th cycle of each
set of cycles, the first IRU receives the current block of input
, and the -th IRU receives an input-block from -th
SR-block. The -th IRU generates the input-matrix ,
for . According to (11), a block of
consecutive samples of -th row of input are required to
construct the matrix . Each IRU receives samples from
the corresponding SR-block during -th cycle and uses
past samples belonging to past input-blocks. The in-
ternal structure of -th IRU is shown in Fig. 7 for
and . It consists of registers, and produces
number of -point input-vectors , for cor-
responding to rows of .
2) Arithmetic-Module Design: The arithmetic module is
comprised of functional-units (FUs) and one adder tree (AT).
In each cycle, FUs of arithmetic module receive sets
of input-vectors from IRUs of storage-module such that
-th FU receives input-vectors generated by -th
IRU and it performs separate inner-product computation
with the -th row of impulse-response matrix to
obtain -point partial output-vector according to (10).
Internal structure of the FU is shown in Fig. 8(a). It consists of
(16)
MOHANTY et al.: MEMORY FOOTPRINT REDUCTION FOR POWER-EFFICIENT REALIZATION 127
Fig. 8. (a) Structure of -th functional unit (FU) for and .
(b) Structure of adder-block for and .
Fig. 9. Internal structure of inner-product cell (IPC) for .
inner-product cells (IPCs). Each IPC (shown in Fig. 9) per-
forms -point inner product of input-vector and weight-vector.
Finally, partial-output vectors of FUs are added together
in an adder-block (shown in Fig. 8(b)) according to (8) to obtain
one block of output in every cycle, where one clock period
, and are,
respectively, one multiplication time, addition time and one
full-adder delay. One complete row of output is obtained in
cycles and the entire output matrix of size in
cycles.
B. Block-Based Structure for Separable Filter
The proposed block-based structure for separable 2-D FIR
filter is shown in Fig. 10. It consists of one processing cell (PC)
and one transposition-unit (TU), where a PC consists of two
FUs. Structure of each FU is the same as the one shown in
Fig. 8(a) except that a constant vector is stored in each FU. In
this case, FU-1 and FU-2 store the constant-vectors and
, respectively. It processes the -th block of input
of the -th row of during the -th cycle of a set of
cycles, and produces an output block , where
and is the input block-size. One complete row of is pro-
cessed in cycles and the complete image in cycles. Pro-
Fig. 10. Proposed block-based structure for separable 2-D FIR filter for
and .
posed structure receives a block of input samples through
number of -point input-vectors where each of the
input-vectors is overlapped by samples. Input-vectors
of -th and -th cycles of the -th set input cycles are
shown in Fig. 10 for and . Components of each
input-vectors are shown in the rectangular box adjacent to its
left. The input-vectors of -th cycle form the matrix as
given in (15), where is the -th row of , for
, and .
rows of are fed to IPCs of FU-1 in parallel to perform
one matrix-vector multiplication with the constant vector
to calculate an -point intermediate-vector according to
(14a). From each intermediate-vector, one intermediate-matrix
(as given in (15)) is generated, such that is generated from
. TU generates the required rows of from .
We can find from (11) and (16) that the elements of and
satisfy similar property. Therefore, TU performs the same
function as the IRUs and its structure is identical with that of an
IRU (as shown in Fig. 7). The TU of separable structure gener-
ates rows , for ) of in parallel and feeds
those to FU-2 in parallel to perform one matrix-vector product
with constant-vector to compute a block of filter output
according to (16b).
V. GENERIC STRUCTURES
In this Section, we derive two separate generic structures for
non-separable and separable filter banks. Also we have pro-
posed a unified structure for realization of 2-D FIR filter-bank
comprised of non-separable and separable filters.
A. Generic Block-Based Structure for Non-Separable Filters
The coefficient matrices of non-separable filters can have va-
rieties of symmetry, e.g., diagonal, centro, four-fold rotational,
quadrant etc. These symmetries can be exploited to realize the
transfer functions with lesser number of multiplications. The
arithmetic module of proposed structure for non-separable fil-
ters could be optimized to take advantage of these symmetry
property. The storage-module of the structure can be interfaced
128 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 61, NO. 1, JANUARY 2014
Fig. 11. Proposed generic block-based structure for non-separable 2-D FIR fil-
ters, is the filter size and is the input block-size.
Fig. 12. Proposed generic block-based structure for separable 2-D FIR filters,
is the filter size and is the input block-size.
as a common unit with arithmetic modules of filters with and
without symmetry. This results in a generic structure shown in
Fig. 11. Each sub-module of arithmetic-module of generic struc-
ture represents arithmetic module of a constituent filter. The
arithmetic-module of each filter is enabled with a select signal
(EN , for ) to switch off the arithmetic module to
have power saving if the output of a particular filter is not re-
quired. The proposed generic structure can be used to realize
any of the four types of filters or a parallel combination of fil-
ters by selecting the arithmetic modules of respective filters
through the select signals. The proposed generic structure has
higher MRE and MBS than the non-separable structure for a
given block-size and filter-size due to common storage-unit.
The area-delay-product (ADP) and power consumed per output
(PPO) of the proposed generic structure in parallel configuration
is expected to be significantly less than separate implementation
of individual filters.
B. Generic Block-Based Structure for Separable Filters
The proposed generic structure for the realization of sepa-
rable filters with and without symmetry is shown in Fig. 12. The
structure is similar to proposed generic non-separable structure
except that the shift-register array is only common with PCs of
the processing unit. Proposed generic structure can be used to
realize a separable filter with and without symmetry in parallel.
Fig. 13. Data-flow of common shift-register array and data distribution block
for non-separable and separable generic structures.
C. Unified Structure for 2-D FIR Filter-Bank
The shift-register size and input to the shift-register array are
the same in the proposed separable and non-separable generic
structures. NL input samples are obtained from the shift-reg-
ister array and fed to the IRU-array of generic non-separable
structure as blocks of samples each, while the processing
unit of the generic separable structure is fed with blocks of
samples each. Therefore, the input-blocks of both non-separable
and separable filters can be obtained from the same shift-reg-
ister array. Outputs of the common shift-register array need to
be rearranged appropriately for the non-separable and separable
generic structures.
Data rearrangement of a common shift-register array is
shown in Fig. 13. The data-flow of an SR-block is shown in
blue color. The input-blocks shifting through the SR-block are
shown for the -th and -th cycle. The contents of the
first two locations of each SR of the SR-block are shown for
the -th and -th input cycles of the -th input row. The
rectangular dotted boxes show the content of one cell of the
SR-block comprised of 4 SRs corresponding to . A direct
flow of data from the shift-register unit meet the data-flow
requirement of non-separable generic structure while shift-reg-
ister output data are rearranged (shown by the data-distribution
block in Fig. 13) for the separable generic structure.
The proposed unified structure for 2-D FIR filter is shown
in Fig. 14. It has a common storage unit for both separable
and non-separable sections which can be used for the realiza-
tion of any one of four types of non-separable or two types
of separable filters. It also can be used for parallel realization
of any combination of separable or non-separable. It involves
memory-words and computes outputs
each of all the six filters in every cycle in its full-parallel con-
figuration. Although we have shown the unified structure for
6 parallel filters, it can be used for realization of more than 6
filters without any additional storage. The processing module
complexity (arithmetic-modules and PCs of each filter) only in-
creases proportionately with the number of parallel filters. This
is a major advantage for area-delay-power efficient realization
MOHANTY et al.: MEMORY FOOTPRINT REDUCTION FOR POWER-EFFICIENT REALIZATION 129
Fig. 14. Proposed block-based unified structure for 2-D FIR filter.
of large size filter banks consisting of separable and non-sepa-
rable filters.
VI. COMPLEXITIES AND PERFORMANCE CONSIDERATIONS
The proposed structure for non-separable filter consists of a
storage-module and an arithmetic module, while the proposed
separable structure consists of one shift-register array and one
processing cell, where the processing cell again consists of both
arithmetic and storage components. The storage module of non-
separable structure consists of one shift-register array and
IRUs. The shift-register array of both non-separable and sepa-
rable structures consists of SRs of words each. Each
IRU consists of registers. The arithmetic module of
non-separable structure consists of FUs and one adder-block
while the processing cell of separable structure consists of two
FUs and one TU (same as the IRU). Each FU consists of IPCs,
and each IPC consists of multipliers and one adder-tree (AT)
to add words. The adder-block consists of such ATs.
A. Complexity of Block-Based Structures for Separable and
Non-Separable Filters
The arithmetic-module of block-based non-separable struc-
ture involves multipliers, adders while each
processing cell of separable structure involves multipliers,
adders and registers. Apart from these, the
non-separable structure involves registers
and the separable structure involves regis-
ters. Both these structures compute outputs per cycle, where
one cycle period is and , re-
spectively for non-separable and separable structure, for
and are, respectively, one multiplier delay, adder delay and
full-adder delay2.
2The delay of AT increases by after each level of the tree. Since, we
have considered a (4 4) 2-D FIR filter to synthesize the proposed design, it
requires an AT of two levels. So the delay introduced by the adder tree becomes
. Therefore, the critical path is not affected much. For large order
filter one can use a pipeline adder-tree (PAT) instead. For simplicity if we can
consider the multiplier consisting of ripple carry adders, the multiplication time
is twice that of an addition time . To maintain a critical path of one
, then we need to introduce a pipeline stage after the first level of adder
tree using registers. Upto a filter of order 511 we do not require any extra
pipeline stage for word length 8 bit or more.
Fig. 15. (a) Memory reuse efficiency (MRE). (b) Memory band-width per
output (MBWPO). NS, S, NSG, SG and UNI, respectively, stands for non-sep-
arable, separable, non-separable generic, separable generic and unified.
TABLE IV
GENERAL COMPARISON OF MRE AND MBWPO OF PROPOSED STRUCTURES.
: INPUT-IMAGE WIDHTH/HEAIGHT, : FILTER-SIZE, : INPUT-BLOCK
LENGTH
B. Complexity of Generic Structures
Arithmetic module of non-separable generic structure com-
prises four sub-modules corresponding to four types of filters
where each sub-module represents arithmetic-module of the
corresponding filter. Sub-modules of symmetric filters involves
the same number of adders as those of sub-module of general
filter which is , and only differ by the number of mul-
tipliers. Sub-module of diagonal, four-fold and quadrant sym-
metry filters, respectively, involves
and multipliers. Arithmetic-module of non-separable
generic structure, therefore, involves multi-
pliers and adders. Processing unit of proposed
separable generic structure is comprised of two processing
cells (one general and one symmetric filter). It involves
130 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 61, NO. 1, JANUARY 2014
TABLE V
COMPARISON OF HARDWARE- AND TIME-COMPLEXITY OF THE PROPOSED STRUCTURES AND EXISTING STRUCTURES
TABLE VI
COMPARISON OF HARDWARE- AND TIME-COMPLEXITY OF PROPOSED NON-SEPARABLE GENERIC AND UNIFIED STRUCTURES AND EXISTING UNIFIED STRUCTURE
OF [17]
Fig. 16. Normalized storage per filter output (calculated for and
, NS, S, NSG, SG and UNI, respectively, stands for non-separable,
separable, non-separable generic, separable generic and unified.
multipliers, adders and registers. The
proposed non-separable generic structure, therefore, involves
multipliers and adders and
registers. It computes outputs of each of
the four filters in every cycle. Similarly, the proposed separable
generic structure involves multipliers, adders
and registers, and computes outputs of each
pair of filters in every cycle. The proposed unified structure has
one storage unit and one processing module which is comprised
of one non-separable section and one separable section. The
non-separable section represent arithmetic-module of proposed
non-separable generic structure whereas the separable section
represents one processing unit of separable generic structure.
Complexity of storage unit is the same as those of proposed
block-based non-separable structure. The proposed unified
structure, therefore, involves multipliers,
TABLE VII
HARDWARE AND TIME COMPLEXITIES OF PROPOSED GENERIC AND UNIFIED
STRUCTURES AND EXISTING UNIFIED STRUCTURE OF [17] FOR INPUT-IMAGE
SIZE , FILTER-SIZE
adders and
registers, and computes filter outputs of each of six filters
(four non-separable and two separable) in every cycle.
C. Memory Reuse Efficiency and Bandwidth Requirement
Using the definition of MRE and MBWPO given in
Section II, we have calculated MRE and MBWPO of pro-
posed structures according to expressions given in Table IV.
Using these expressions, we have estimated MRE and MBWPO
of the proposed structure for filter-size and image-size
. The estimated values are shown in graphs of Fig. 15.
The MRE of proposed structures increases with block-size.
MRE of unified structure is higher than others for a given
block-size. This is mainly due to memory-sharing by more
filters in the unified structure.
D. Performance Comparison
In [16], an efficient 2-D IIR filter (pole-zero) structure is
presented which can be modified for realization of 2-D FIR
MOHANTY et al.: MEMORY FOOTPRINT REDUCTION FOR POWER-EFFICIENT REALIZATION 131
TABLE VIII
COMPARISON OF POST-LAYOUT SYNTHESIS RESULTS OF THE PROPOSED STRUCTURES AND THE STRUCTURES OF [16], [17] AND [29] FOR
AND , USING TSMC 90 NM CMOS TECHNOLOGY LIBRARY, POWER ESTIMATED AT 20 MHZ FREQUENCY
filter by removing the all-pole structure from the pole-zero
structure. Accordingly, we have extracted an FIR filter struc-
ture from [16] and synthesized that for comparison. Hardware
complexity of these extracted structure along with those of
proposed structures and structure of [11] and [29] are listed in
Table V for comparison of complexities. We find from Table V
that proposed non-separable structure involves times more
multipliers and adders than the existing structures and equal
number of registers, but it offers times higher throughput and
times less MBWPO than others. Similarly, proposed sepa-
rable structure involves times more multipliers and adders
than those of [29] and the same number of registers, but offers
times higher throughput and nearly 2 times less MBWPO than
those of [29]. Proposed separable generic structure involves
times more multipliers and times more adders than
those of [29] and more registers, and offers times
higher throughput with nearly 4 times less MBWPO than those
of [29].
We have extracted a unified 2-D FIR filter structure from
the multimodal structure of [17] for comparison purpose. The
hardware and time complexities of this structure and the pro-
posed non-separable generic and unified structure are listed in
Table VI. Compared with structure of [17], proposed non-sep-
arable generic structure and unified structure involve nearly
times more multiplier times more adders and compute
and times more outputs, respectively. Proposed generic
structure involves less registers than that of [17] and
it has nearly times less MBWPO than other. Similarly, the
proposed unified structure involves less registers
and nearly times less MBWPO.
We have estimated hardware complexity of proposed non-
separable generic and unified structures and the unified struc-
ture of [17] for , block-size , 8 and for image-size
. The estimated values are listed in Table VII. We
can find from Table VII that, proposed non-separable generic
structure involves 13.6% less normalized multipliers, 34.7%
less normalized adders and 90% less MBWPO than those of
[17]. Proposed unified structure involves 24.27% less normal-
ized multipliers, 47.82% less normalized adders and 94% less
MBWPO than those of [17]. Unlike the existing unified struc-
ture, the proposed one does not require any steering logics for
signal switching. Note that, the proposed non-separable generic
structure is comprised of one general filter and three symmetric
filters, while the proposed unified structure is comprised of one
general non-separable and one general separable filter, and four
symmetric filters. But, unified structure of [17] is comprised of
only four symmetric filters. Proposed structures have several
advantage over the existing structures, (i) provide filter-banks
for non-separable and/or separable filters with symmetry and/or
without symmetry which could be used in many image pro-
cessing applications, (ii) can be configured for implementation
of any one filter of the filter-bank or parallel configuration of
any of the filters of the filter-bank, and (iii) can be easily scaled
for high-throughput.
We have estimated normalized storage per filter output (SPO)
of proposed structures and the existing structure of [11], [16],
[17], [29] for and and shown in
the graph of Fig. 16. We find that proposed structures involves
less normalized SPO than the existing structure. This is mainly
due to memory-reuse efficiency and memory-sharing of the pro-
posed structures.
E. ASIC Synthesis Result
We have coded the proposed designs in VHDL for filter size
, block-size and image-size (512 512), and
synthesized using Synopsys tool. We have also synthesized
non-separable 2-D FIR filter structures extracted from IIR
structure of [16] and unified structure of [17] and separable
structure of [29]. We have used Wallace-tree based generic
Booth-multiplier of Synopsys DesignWare building blocks
library for all the designs. Shift-registers and registers of all
designs are synthesized using D-FF (flip-flop). We have consid-
ered input signal width -bit and all intermediate signals
and output signal width -bit. We have set switching
132 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 61, NO. 1, JANUARY 2014
activity toggle rate 0.25 and static probability is 0.5. The netlist
file obtained from the Synopsys Design Compiler is processed
in IC Compiler. After place, route and clock synthesis area, time
and power (leakage and dynamic) reported by the IC Compiler
are listed in Table VIII for comparison. Power consumption of
all the synthesized designs are estimated for 20 MHz clock.
Due to memory footprint reduction, the area, leakage power
and dynamic power consumption of the proposed structures
do not increase proportionately with the number of outputs.
Consequently, proposed structures involve significantly less
area-delay-product (ADP3) and consume less energy per output
(EPO4). Compared with [16], proposed generic non-separable
structure involves 11.25 times less ADP and 15.63 times less
EPO. The proposed generic separable structure involves 4.85
times less ADP and 5.53 times less EPO than those of [29].
Compared with [17], proposed unified structure involves 4.64
times less area and 9.78 times less EPO.
VII. CONCLUSION
We have analyzed memory footprint and combinational
complexity of 2-D FIR structures to arrive at a systematic
design strategy to derive area-delay-power-efficient architec-
tures. Based on that we have presented novel block-based
separable and non-separable structures, generic structures for
separable and non-separable filter-banks and unified structure
for concurrent realization of both separable and non-separable
filter-banks. It is shown that storage requirement of proposed
structures does not change with input block-size . Similarly,
the storage size of generic non-separable structure is indepen-
dent of the number of parallel filters in a filter-bank. It
increases marginally by words in case of generic
separable and unified structures, where is the filter size.
Proposed structures, therefore, offer higher memory reuse
efficiency and MBW reduction for higher values of and .
This reduces SPO and PPO of proposed structures.
Compared with the existing structures, proposed block-based
separable and non-separable structures involve the same
number of storage words, and proportionately the same or
less number of arithmetic resources than the corresponding
best of existing structures, and compute PL times more filter
outputs per cycle with PL times less MBW. The proposed
unified structure with 4 non-separable filters and 2 separable
filters involve nearly times more multipliers, times
more adders, less registers than existing unified
structure, and computes times more filter outputs per cycle
with times less MBW than the other. ASIC synthesis result
for filter-size (4 4), input-block size and image-size
(512 512) shows that proposed block-based non-separable
and generic non-separable structures, respectively, involve
5.95 times and 11.25 times less ADP, and 5.81 times and 15.63
times less EPO than the corresponding existing structures. The
proposed unified structure involves 4.64 times less ADP and
9.78 times less EPO than the corresponding existing structure.
When the inner-product computation involved in FIR filtering
3ADP area data-arrival time(DAT)/outputs per cycle
4power clock-period/number of outputs per cycle
is realized by multiplier-less designs to reduce the combina-
tional-complexity that may require additional memory leading
to overall increase in memory complexity. But, the advantage
gained by the proposed memory footprint reduction technique
is not affected by such change of method of implementation of
inner-products.
REFERENCES
[1] D. E. Dudgeon and R. M. Mersereau, Multidimensional Digital Signal
Processing. Englewood Cliffs, NJ, USA: Prentice-Hall, 1984.
[2] M. A. Sid-Ahmed, Image Processing: Theory, Algorithms, and Archi-
tectures. New York, NY, USA: McGraw-Hill, 1995.
[3] T. Barbu, “Gabor filter based face recognition technique,” in Proc.
Rmanian Acad. Ser. A, 2010, vol. 11, no. 3/2010, pp. 277–283.
[4] S. E. Grigorescu, N. Petkov, and P. Kruizinga, “Comparison of texture
features based on Gabor filters,” IEEE Trans. Image Process., vol. 11,
no. 10, pp. 1160–1167, Oct. 2002.
[5] W. Li, K. Mao, H. Zhang, and T. Chai, “Designing compact Gabor
filter banks for efficient texture feature extraction,” in Proc. 11th Int.
Conf. Contr., Automation, Robotics Vis., Singapore, Dec. 7–10, 2010,
pp. 1193–1197.
[6] M. A. Sid-Ahemed, “A systolic realization for 2-D digital filters,” IEEE
Trans. Acoustic Speech Signal Process., vol. 37, pp. 560–565, Apr.
1989.
[7] N. R. Sanbhag, “An improved systolic architecture for 2-D digital fil-
ters,” IEEE Trans. Signal Process., vol. 39, pp. 1195–1202, May 1991.
[8] B. K. Mohanty and P. K. Meher, “Cost-effective novel flexible
cell-vlevel systolic architecture for high-throughput implementation
of 2-D FIR filters,” Proc. IET Comput. Digit. Tech., vol. 143, no. 6,
pp. 436–439, Nov. 1996.
[9] B. K. Mohanty and P. K. Meher, “High-throughput and low-latency
implementation of bit-level systolic architecture for 1-D and 2-D digital
filters,” IET Proc. Computer Digital Technique, vol. 146, no. 2, pp.
91–99, Mar. 1999.
[10] L. D. Van, C. C. Tang, S. Tenqchen, and W. S. Feng, “New VLSI ar-
chitecture without global broadcast for 2-D systolic digital filters,” in
Proc. IEEE Int. Symp. Circuis Syst. ISCAS, Geneva, Switzerland, May
2000, vol. 1, pp. 547–550.
[11] L. D. Van, “A new 2-D systolic digital filters architecture without
global broadcast,” IEEE Trans. Very Large Scale Integr. Syst., vol. 10,
no. 4, pp. 477–486, Aug. 2002.
[12] H. C. Reddy, I. H. Khoo, and P. K. Rajan, “2-D symmetry: Theory and
filter design applications,” IEEE Circuits System Mag., vol. 3, no. 3,
pp. 4–33, 3rd Q 2003.
[13] P. Y. Chen, L. D. Van, H. C. Reddy, and C. T. Lin, “A new VLSI 2-D
diagonal-symmetry filter architecture design,” in Proc. IEEE APCCAS,
Macao, China, Nov. 2008, pp. 320–323.
[14] I. H. Khoo, H. C. Reddy, L. D. Van, and C. T. Lin, “2-D digital filter ar-
chitectures without global broadcast and some symmetry applications,”
in Proc. IEEE Int. Symp. Circuis Syst. ISCAS, May 2009, pp. 952–955.
[15] P. Y. Chen, L. D. Van, H. C. Reddy, and C. T. Lin, “A new VLSI 2-D
fourfold-rotational-symmetry filter architecture design,” in Proc. IEEE
Int. Symp. Circuis Syst. (ISCAS), May 2009, pp. 93–96.
[16] I. H. Khoo, H. C. Reddy, L. D. Van, and C. T. Lin, “Generalized formu-
lation of 2-D filter structures without global broadcast for VLSI imple-
mentation,” in Proc., IEEE MWSCAS, Seattle, WA, USA, Aug. 2010,
pp. 426–529.
[17] P. Y. Chen, L. D. Van, I. H. Khoo, H. C. Reddy, and C. T. Lin, “Power-
efficient cost-effective 2-D symmetry filter architecture,” IEEE Trans.
Circuit Syst. I, Reg. Papers, vol. 58, no. 1, pp. 112–125, Jan. 2011.
[18] K. K. Parhi, VLSI Digital Signal Processing. New York, NY, USA:
Wiley, 1998.
[19] C. Y. Yao, H. H. Chen, C. J. Chien, and C. T. Hsu, “A novel common-
subexpression elimination method for synthesizing fixed-point FIR fil-
ters,” IEEE Trans. Circuit Syst. I, Reg. Papers, vol. 51, no. 11, pp.
2215–2221, Nov. 2004.
[20] Y. Voronenko and M. Puschel, “Multiplierless multiple constant mul-
tiplication,” ACM Trans. Algorithms, vol. 3, no. 2, May 2007, Article
11.
[21] A. P. Vinod and E. M.-K. Lai, “An efficient coefficient-partitioning
algorithm for realizing low complexity digital filters,” IEEE Trans.
Comput.-Aided Design Integr. Circuits Syst., vol. 24, no. 12, pp.
1936–1946, Dec. 2005.
MOHANTY et al.: MEMORY FOOTPRINT REDUCTION FOR POWER-EFFICIENT REALIZATION 133
[22] R. Mahesh and A. P. Vinod, “A new common subexpression elimina-
tion algorithm for realizing low complexity higher order digital filters,”
IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 27, no.
2, pp. 217–219, Feb. 2008.
[23] J. Luis, T.-X. , R. M. A.-P. , and M. Bayoumi, “Hybrid multiplierless
FIR filter architecture based on NEDA,” in Proc. IFIP Int. Conf. Very
Large Scale Integration (VLSI-SoC 2007), 2007, pp. 316–319.
[24] P. K. Meher, S. Chandrasekaran, and A. Amira, “FPGA realization of
FIR filters by efficient and flexible systolization using distributed arith-
metic,” IEEE Trans. Signal Process., vol. 56, no. 7, pp. 3009–3017, Jul.
2008.
[25] P. K. Meher, “New approach to look-up table design and memory-
based realization of FIR digital filters,” IEEE Trans. Comput.-Aided
Design Integr. Circuits Syst., vol. 57, no. 3, pp. 592–603, Mar. 2010.
[26] A. Croisier, D. J. Esteban, M. E. Levilion, and V. Rizo, “Digital Filter
for PCM Encoded Signals,” U.S. Patent 3777130, Apr. 12, 1973.
[27] S. A. White, “Applications of the distributed arithmetic to digital signal
processing: A tutorial review,” IEEE ASSP Mag., vol. 6, no. 3, pp.
5–19, July 1989.
[28] R. I. Hartley, “Subexpression sharing in filters using canonic signed
digit multipliers,” IEEE Trans. Circuits Syst. II, Analog Digital Signal
Process., vol. 43, no. 10, pp. 677–688, Oct. 1996.
[29] B. K. Mohanty and P. K. Meher, “New scan method and pipeline archi-
tecture for VLSI implementation of separable 2-D FIR filters without
using transposition,” in Proc. IEEE Region 10 TENCON2008 Conf.,
Hyderabad, India, Nov. 2008.
Basant K. Mohanty (M’06–SM’11) received the
M.Sc. degree in physics from Sambalpur University,
India, in 1989 and the Ph.D. degree in the field of
VLSI for Digital Signal Processing from Berhampur
University, Orissa, in 2000.
In 2001 he joined as Lecturer in EEE Department,
BITS Pilani, Rajasthan. Then he joined as an Assis-
tant Professor in the Department of ECE, Mody Insti-
tute of Education Research (Deemed University), Ra-
jasthan. In 2003 he joined Jaypee University of En-
gineering and Technology, Guna, Madhya Pradesh,
where he became Associate Professor in 2005 and full Professor in 2007. His
research interest includes design and implementation of low-power and high-
performance systems for adaptive filters, image and video-processing appli-
cations, secured communication and reconfigurable architectures. He has pub-
lished nearly 50 technical papers.
Dr. Mohanty is serving as Associate Editor for the Journal of Circuits, Sys-
tems, and Signal Processing. He is a life time member of the Institution of Elec-
tronics and Telecommunication Engineering, New Delhi, India, and he was the
recipient of the Rashtriya Gaurav Award conferred by India International friend-
ship Society, New Delhi for the year 2012.
Pramod Kumar Meher (SM’03) received the M.Sc.
degree in physics and the Ph.D. degree in science
from Sambalpur University, India, in 1978, and 1996,
respectively.
Currently, he is a Senior Scientist with the Institute
for Infocomm Research, Singapore. Previously, he
was a Professor of Computer Applications with
Utkal University, India, from 1997 to 2002, and a
Reader in electronics with Berhampur University,
India, from 1993 to 1997. His research interest
includes design of dedicated and reconfigurable
architectures for computation-intensive algorithms pertaining to signal, image
and video processing, communication, bio-informatics and intelligent com-
puting. He has contributed nearly 200 technical papers to various reputed
journals and conference proceedings.
Dr. Meher has served as a speaker for the Distinguished Lecturer Program
(DLP) of IEEE Circuits Systems Society during 2011 and 2012 and Associate
Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS
BRIEFS during 2008 to 2011. Currently, he is serving as Associate Editor for the
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, the
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS,
and Journal of Circuits, Systems, and Signal Processing. Dr. Meher is a Fellow
of the Institution of Electronics and Telecommunication Engineers, India. He
was the recipient of the Samanta Chandrasekhar Award for excellence in re-
search in engineering and technology for 1999.
Somaya Al-Maadeed (SM’12) received the B.Sc. degree in computer science
from Qatar University, Qatar, in 1994, the M.Sc. degree in mathematics and
computer science from Alexandria University, Egypt, in 1999, and the Ph.D. in
computer science from Nottingham University, Nottingham, U.K., in 2004.
Following her Ph.D., she worked as an Assistant Professor with Qatar Uni-
versity, where she did research in the areas of biometrics, digital filters, speech
recognition, image processing, and document management. She has published
around forty papers in the above general areas.
Abbes Amira (M’01–SM’07) received the Ph.D.
degree in the area of electronic and computer
engineering from Queens University Belfast, U.K.,
in 2001, developing a coprocessor for matrix algo-
rithms using FPGAs.
He is a Full Professor in visual communication at
the University of the West of Scotland (UWS), Scot-
land. Prior to joining UWS, he took academic po-
sitions at the university of Ulster-UK as Reader in
embedded systems in the Nanotechnology and Inte-
grated Bio-Engineering Centre (NIBEC); Associate
Professor in Embedded Systems in the department of electrical engineering at
Qatar University-Qatar, Senior lecturer at Brunel University-UK within the di-
vision of Electronic and Computer Engineering and a lectureship in Computer
Engineering at Queens University. He has been awarded a number of grants
from government and industry and has published around 200 publications in
the area of reconfigurable computing and image and signal processing during his
career to date. Twelve Ph.D. students have successfully completed their Ph.D.
under his supervision. He took consultancy positions with many companies in
UK, and he holds two visiting professor positions at the University of Nancy,
Henri Poincare, France and University of Tunn Hussein Onn, Malaysia. His re-
search interests include: embedded systems, high performance reconfigurable
computing, image and video processing, multi-resolution analysis, biometrics
and connected health applications.
Prof. Amira has been invited to give talks, short courses and tutorials at uni-
versities and international conferences and being chair, program committee for
a number of conferences. He was one of the tutorials presenters at ICIP 2009,
Conference Chair of ECVW 2011, Program Chair of ECVW2010, Program
Co-Chair of ICM12, DELTA 2008, IMVIP 2005. He is also one of the 2008
VARIAN prize recipients. He has been an external examiner for many Universi-
ties in UK, Hong Kong, Australia and Malaysia. He was one of the guest editors
for the Special Issue in the Pattern Recognition Journal, titled Feature Gener-
ation and Machine Learning for Robust Multimodal Biometrics, March 2008.
He is a Fellow of IET, Fellow of the Higher Education Academy, and Senior
member of ACM.

More Related Content

What's hot

A survey of data recovery on flash memory
A survey of data recovery on flash memory A survey of data recovery on flash memory
A survey of data recovery on flash memory IJECEIAES
 
Reversible encrypted data concealment in images by reserving room approach
Reversible encrypted data concealment in images by reserving room approachReversible encrypted data concealment in images by reserving room approach
Reversible encrypted data concealment in images by reserving room approachIAEME Publication
 
Diametrical Mesh of Tree (D2D-MoT) Architecture: A Novel Routing Solution for...
Diametrical Mesh of Tree (D2D-MoT) Architecture: A Novel Routing Solution for...Diametrical Mesh of Tree (D2D-MoT) Architecture: A Novel Routing Solution for...
Diametrical Mesh of Tree (D2D-MoT) Architecture: A Novel Routing Solution for...IDES Editor
 
Effective Compression of Digital Video
Effective Compression of Digital VideoEffective Compression of Digital Video
Effective Compression of Digital VideoIRJET Journal
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Using Compression Techniques to Streamline Image and Video Storage and Retrieval
Using Compression Techniques to Streamline Image and Video Storage and RetrievalUsing Compression Techniques to Streamline Image and Video Storage and Retrieval
Using Compression Techniques to Streamline Image and Video Storage and RetrievalCognizant
 
An overview Survey on Various Video compressions and its importance
An overview Survey on Various Video compressions and its importanceAn overview Survey on Various Video compressions and its importance
An overview Survey on Various Video compressions and its importanceINFOGAIN PUBLICATION
 
System partitioning in VLSI and its considerations
System partitioning in VLSI and its considerationsSystem partitioning in VLSI and its considerations
System partitioning in VLSI and its considerationsSubash John
 
Looking beyond moores_law_deepak
Looking beyond moores_law_deepakLooking beyond moores_law_deepak
Looking beyond moores_law_deepakDeepak Sekar
 
Buffer Trees - Utility and Applications for External Memory Data Processing
Buffer Trees - Utility and Applications for External Memory Data ProcessingBuffer Trees - Utility and Applications for External Memory Data Processing
Buffer Trees - Utility and Applications for External Memory Data ProcessingMilind Gokhale
 
3D Microprocessor Design: Stacking at different granularities
3D Microprocessor Design: Stacking at different granularities3D Microprocessor Design: Stacking at different granularities
3D Microprocessor Design: Stacking at different granularitiesAlberto Villegas
 

What's hot (14)

E04552327
E04552327E04552327
E04552327
 
A survey of data recovery on flash memory
A survey of data recovery on flash memory A survey of data recovery on flash memory
A survey of data recovery on flash memory
 
Reversible encrypted data concealment in images by reserving room approach
Reversible encrypted data concealment in images by reserving room approachReversible encrypted data concealment in images by reserving room approach
Reversible encrypted data concealment in images by reserving room approach
 
Diametrical Mesh of Tree (D2D-MoT) Architecture: A Novel Routing Solution for...
Diametrical Mesh of Tree (D2D-MoT) Architecture: A Novel Routing Solution for...Diametrical Mesh of Tree (D2D-MoT) Architecture: A Novel Routing Solution for...
Diametrical Mesh of Tree (D2D-MoT) Architecture: A Novel Routing Solution for...
 
Effective Compression of Digital Video
Effective Compression of Digital VideoEffective Compression of Digital Video
Effective Compression of Digital Video
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Ih2514851489
Ih2514851489Ih2514851489
Ih2514851489
 
Using Compression Techniques to Streamline Image and Video Storage and Retrieval
Using Compression Techniques to Streamline Image and Video Storage and RetrievalUsing Compression Techniques to Streamline Image and Video Storage and Retrieval
Using Compression Techniques to Streamline Image and Video Storage and Retrieval
 
45 135-1-pb
45 135-1-pb45 135-1-pb
45 135-1-pb
 
An overview Survey on Various Video compressions and its importance
An overview Survey on Various Video compressions and its importanceAn overview Survey on Various Video compressions and its importance
An overview Survey on Various Video compressions and its importance
 
System partitioning in VLSI and its considerations
System partitioning in VLSI and its considerationsSystem partitioning in VLSI and its considerations
System partitioning in VLSI and its considerations
 
Looking beyond moores_law_deepak
Looking beyond moores_law_deepakLooking beyond moores_law_deepak
Looking beyond moores_law_deepak
 
Buffer Trees - Utility and Applications for External Memory Data Processing
Buffer Trees - Utility and Applications for External Memory Data ProcessingBuffer Trees - Utility and Applications for External Memory Data Processing
Buffer Trees - Utility and Applications for External Memory Data Processing
 
3D Microprocessor Design: Stacking at different granularities
3D Microprocessor Design: Stacking at different granularities3D Microprocessor Design: Stacking at different granularities
3D Microprocessor Design: Stacking at different granularities
 

Viewers also liked

831 panda public-auditing-for-shared-data-with-efficient-user-revocation-in-t...
831 panda public-auditing-for-shared-data-with-efficient-user-revocation-in-t...831 panda public-auditing-for-shared-data-with-efficient-user-revocation-in-t...
831 panda public-auditing-for-shared-data-with-efficient-user-revocation-in-t...Raja Ram
 
Automatic face naming by learning discriminative affinity matrices from weakl...
Automatic face naming by learning discriminative affinity matrices from weakl...Automatic face naming by learning discriminative affinity matrices from weakl...
Automatic face naming by learning discriminative affinity matrices from weakl...Raja Ram
 
Integrating mobile access with university data processing in the cloud
Integrating mobile access with university data processing in the cloudIntegrating mobile access with university data processing in the cloud
Integrating mobile access with university data processing in the cloudRaja Ram
 
Multi transmit beam forming for fast cardiac
Multi transmit beam forming for fast cardiacMulti transmit beam forming for fast cardiac
Multi transmit beam forming for fast cardiacRaja Ram
 
How to build and automate your first process in IntelliPro BPMS?
How to build and automate your first process in IntelliPro BPMS?How to build and automate your first process in IntelliPro BPMS?
How to build and automate your first process in IntelliPro BPMS?iLeap
 
Python First Class
Python First ClassPython First Class
Python First ClassYao Zuo
 
IntelliPro BPMS - Process Frameworks
IntelliPro BPMS - Process FrameworksIntelliPro BPMS - Process Frameworks
IntelliPro BPMS - Process FrameworksiLeap
 

Viewers also liked (7)

831 panda public-auditing-for-shared-data-with-efficient-user-revocation-in-t...
831 panda public-auditing-for-shared-data-with-efficient-user-revocation-in-t...831 panda public-auditing-for-shared-data-with-efficient-user-revocation-in-t...
831 panda public-auditing-for-shared-data-with-efficient-user-revocation-in-t...
 
Automatic face naming by learning discriminative affinity matrices from weakl...
Automatic face naming by learning discriminative affinity matrices from weakl...Automatic face naming by learning discriminative affinity matrices from weakl...
Automatic face naming by learning discriminative affinity matrices from weakl...
 
Integrating mobile access with university data processing in the cloud
Integrating mobile access with university data processing in the cloudIntegrating mobile access with university data processing in the cloud
Integrating mobile access with university data processing in the cloud
 
Multi transmit beam forming for fast cardiac
Multi transmit beam forming for fast cardiacMulti transmit beam forming for fast cardiac
Multi transmit beam forming for fast cardiac
 
How to build and automate your first process in IntelliPro BPMS?
How to build and automate your first process in IntelliPro BPMS?How to build and automate your first process in IntelliPro BPMS?
How to build and automate your first process in IntelliPro BPMS?
 
Python First Class
Python First ClassPython First Class
Python First Class
 
IntelliPro BPMS - Process Frameworks
IntelliPro BPMS - Process FrameworksIntelliPro BPMS - Process Frameworks
IntelliPro BPMS - Process Frameworks
 

Similar to 06542014

IRJET-An Efficient Real-Time Controller for Retrieving Multimedia Data from S...
IRJET-An Efficient Real-Time Controller for Retrieving Multimedia Data from S...IRJET-An Efficient Real-Time Controller for Retrieving Multimedia Data from S...
IRJET-An Efficient Real-Time Controller for Retrieving Multimedia Data from S...IRJET Journal
 
UNIT 3 Memory Design for SOC.ppUNIT 3 Memory Design for SOC.pptx
UNIT 3 Memory Design for SOC.ppUNIT 3 Memory Design for SOC.pptxUNIT 3 Memory Design for SOC.ppUNIT 3 Memory Design for SOC.pptx
UNIT 3 Memory Design for SOC.ppUNIT 3 Memory Design for SOC.pptxSnehaLatha68
 
Controller design for multichannel nand flash memory for higher efficiency in...
Controller design for multichannel nand flash memory for higher efficiency in...Controller design for multichannel nand flash memory for higher efficiency in...
Controller design for multichannel nand flash memory for higher efficiency in...eSAT Journals
 
FPGA Implementation of Multiplier-less CDF-5/3 Wavelet Transform for Image Pr...
FPGA Implementation of Multiplier-less CDF-5/3 Wavelet Transform for Image Pr...FPGA Implementation of Multiplier-less CDF-5/3 Wavelet Transform for Image Pr...
FPGA Implementation of Multiplier-less CDF-5/3 Wavelet Transform for Image Pr...IOSRJVSP
 
An Area Efficient Mixed Decimation MDF Architecture for Radix 22 Parallel FFT
An Area Efficient Mixed Decimation MDF Architecture for Radix 22  Parallel FFTAn Area Efficient Mixed Decimation MDF Architecture for Radix 22  Parallel FFT
An Area Efficient Mixed Decimation MDF Architecture for Radix 22 Parallel FFTIRJET Journal
 
Efficient and scalable multitenant placement approach for in memory database ...
Efficient and scalable multitenant placement approach for in memory database ...Efficient and scalable multitenant placement approach for in memory database ...
Efficient and scalable multitenant placement approach for in memory database ...CSITiaesprime
 
Top 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & AnalysisTop 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & AnalysisNomanSiddiqui41
 
RISC Implementation Of Digital IIR Filter in DSP
RISC Implementation Of Digital IIR Filter in DSPRISC Implementation Of Digital IIR Filter in DSP
RISC Implementation Of Digital IIR Filter in DSPiosrjce
 
Reliable Hydra SSD Architecture for General Purpose Controllers
Reliable Hydra SSD Architecture for General Purpose ControllersReliable Hydra SSD Architecture for General Purpose Controllers
Reliable Hydra SSD Architecture for General Purpose ControllersIJMER
 
Architecture and implementation issues of multi core processors and caching –...
Architecture and implementation issues of multi core processors and caching –...Architecture and implementation issues of multi core processors and caching –...
Architecture and implementation issues of multi core processors and caching –...eSAT Publishing House
 
Design and Simulation of a 16kb Memory using Memory Banking technique
Design and Simulation of a 16kb Memory using Memory Banking techniqueDesign and Simulation of a 16kb Memory using Memory Banking technique
Design and Simulation of a 16kb Memory using Memory Banking techniqueIRJET Journal
 
High Performance Medical Reconstruction Using Stream Programming Paradigms
High Performance Medical Reconstruction Using Stream Programming ParadigmsHigh Performance Medical Reconstruction Using Stream Programming Paradigms
High Performance Medical Reconstruction Using Stream Programming ParadigmsQuEST Global (erstwhile NeST Software)
 
A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...
A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...
A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...IOSR Journals
 
Data De-Duplication Engine for Efficient Storage Management
Data De-Duplication Engine for Efficient Storage ManagementData De-Duplication Engine for Efficient Storage Management
Data De-Duplication Engine for Efficient Storage ManagementIRJET Journal
 

Similar to 06542014 (20)

IRJET-An Efficient Real-Time Controller for Retrieving Multimedia Data from S...
IRJET-An Efficient Real-Time Controller for Retrieving Multimedia Data from S...IRJET-An Efficient Real-Time Controller for Retrieving Multimedia Data from S...
IRJET-An Efficient Real-Time Controller for Retrieving Multimedia Data from S...
 
UNIT 3 Memory Design for SOC.ppUNIT 3 Memory Design for SOC.pptx
UNIT 3 Memory Design for SOC.ppUNIT 3 Memory Design for SOC.pptxUNIT 3 Memory Design for SOC.ppUNIT 3 Memory Design for SOC.pptx
UNIT 3 Memory Design for SOC.ppUNIT 3 Memory Design for SOC.pptx
 
Controller design for multichannel nand flash memory for higher efficiency in...
Controller design for multichannel nand flash memory for higher efficiency in...Controller design for multichannel nand flash memory for higher efficiency in...
Controller design for multichannel nand flash memory for higher efficiency in...
 
FPGA Implementation of Multiplier-less CDF-5/3 Wavelet Transform for Image Pr...
FPGA Implementation of Multiplier-less CDF-5/3 Wavelet Transform for Image Pr...FPGA Implementation of Multiplier-less CDF-5/3 Wavelet Transform for Image Pr...
FPGA Implementation of Multiplier-less CDF-5/3 Wavelet Transform for Image Pr...
 
An Area Efficient Mixed Decimation MDF Architecture for Radix 22 Parallel FFT
An Area Efficient Mixed Decimation MDF Architecture for Radix 22  Parallel FFTAn Area Efficient Mixed Decimation MDF Architecture for Radix 22  Parallel FFT
An Area Efficient Mixed Decimation MDF Architecture for Radix 22 Parallel FFT
 
Efficient and scalable multitenant placement approach for in memory database ...
Efficient and scalable multitenant placement approach for in memory database ...Efficient and scalable multitenant placement approach for in memory database ...
Efficient and scalable multitenant placement approach for in memory database ...
 
Top 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & AnalysisTop 10 Supercomputers With Descriptive Information & Analysis
Top 10 Supercomputers With Descriptive Information & Analysis
 
RISC Implementation Of Digital IIR Filter in DSP
RISC Implementation Of Digital IIR Filter in DSPRISC Implementation Of Digital IIR Filter in DSP
RISC Implementation Of Digital IIR Filter in DSP
 
Bm34399403
Bm34399403Bm34399403
Bm34399403
 
Reliable Hydra SSD Architecture for General Purpose Controllers
Reliable Hydra SSD Architecture for General Purpose ControllersReliable Hydra SSD Architecture for General Purpose Controllers
Reliable Hydra SSD Architecture for General Purpose Controllers
 
Lc3519051910
Lc3519051910Lc3519051910
Lc3519051910
 
Architecture and implementation issues of multi core processors and caching –...
Architecture and implementation issues of multi core processors and caching –...Architecture and implementation issues of multi core processors and caching –...
Architecture and implementation issues of multi core processors and caching –...
 
UNIT 3.docx
UNIT 3.docxUNIT 3.docx
UNIT 3.docx
 
Design and Simulation of a 16kb Memory using Memory Banking technique
Design and Simulation of a 16kb Memory using Memory Banking techniqueDesign and Simulation of a 16kb Memory using Memory Banking technique
Design and Simulation of a 16kb Memory using Memory Banking technique
 
High Performance Medical Reconstruction Using Stream Programming Paradigms
High Performance Medical Reconstruction Using Stream Programming ParadigmsHigh Performance Medical Reconstruction Using Stream Programming Paradigms
High Performance Medical Reconstruction Using Stream Programming Paradigms
 
Ku3419461949
Ku3419461949Ku3419461949
Ku3419461949
 
A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...
A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...
A Novel Approach of Area-Efficient FIR Filter Design Using Distributed Arithm...
 
N045067680
N045067680N045067680
N045067680
 
Design and performance analysis of efficient hybrid mode multi-ported memory...
Design and performance analysis of efficient hybrid mode  multi-ported memory...Design and performance analysis of efficient hybrid mode  multi-ported memory...
Design and performance analysis of efficient hybrid mode multi-ported memory...
 
Data De-Duplication Engine for Efficient Storage Management
Data De-Duplication Engine for Efficient Storage ManagementData De-Duplication Engine for Efficient Storage Management
Data De-Duplication Engine for Efficient Storage Management
 

More from Raja Ram

Heterogeneous heed protocol for wireless sensor networks springer
Heterogeneous heed protocol for wireless sensor networks   springerHeterogeneous heed protocol for wireless sensor networks   springer
Heterogeneous heed protocol for wireless sensor networks springerRaja Ram
 
Face recognition across non uniform motion blur, illumination, and pose
Face recognition across non uniform motion blur, illumination, and poseFace recognition across non uniform motion blur, illumination, and pose
Face recognition across non uniform motion blur, illumination, and poseRaja Ram
 
Detection and rectification of distorted fingerprints
Detection and rectification of distorted fingerprintsDetection and rectification of distorted fingerprints
Detection and rectification of distorted fingerprintsRaja Ram
 
A method for detecting abnormal program behavior on embedded devices
A method for detecting abnormal program behavior on embedded devicesA method for detecting abnormal program behavior on embedded devices
A method for detecting abnormal program behavior on embedded devicesRaja Ram
 
A novel automated identification of intima media thickness in ultrasound
A novel automated identification of intima media thickness in ultrasoundA novel automated identification of intima media thickness in ultrasound
A novel automated identification of intima media thickness in ultrasoundRaja Ram
 
Multi transmit beam forming for fast cardiac
Multi transmit beam forming for fast cardiacMulti transmit beam forming for fast cardiac
Multi transmit beam forming for fast cardiacRaja Ram
 
Novel modilo based aloha anti collision
Novel modilo based aloha anti collisionNovel modilo based aloha anti collision
Novel modilo based aloha anti collisionRaja Ram
 
A new secure image transmission technique via secret
A new secure image transmission technique  via secretA new secure image transmission technique  via secret
A new secure image transmission technique via secretRaja Ram
 

More from Raja Ram (8)

Heterogeneous heed protocol for wireless sensor networks springer
Heterogeneous heed protocol for wireless sensor networks   springerHeterogeneous heed protocol for wireless sensor networks   springer
Heterogeneous heed protocol for wireless sensor networks springer
 
Face recognition across non uniform motion blur, illumination, and pose
Face recognition across non uniform motion blur, illumination, and poseFace recognition across non uniform motion blur, illumination, and pose
Face recognition across non uniform motion blur, illumination, and pose
 
Detection and rectification of distorted fingerprints
Detection and rectification of distorted fingerprintsDetection and rectification of distorted fingerprints
Detection and rectification of distorted fingerprints
 
A method for detecting abnormal program behavior on embedded devices
A method for detecting abnormal program behavior on embedded devicesA method for detecting abnormal program behavior on embedded devices
A method for detecting abnormal program behavior on embedded devices
 
A novel automated identification of intima media thickness in ultrasound
A novel automated identification of intima media thickness in ultrasoundA novel automated identification of intima media thickness in ultrasound
A novel automated identification of intima media thickness in ultrasound
 
Multi transmit beam forming for fast cardiac
Multi transmit beam forming for fast cardiacMulti transmit beam forming for fast cardiac
Multi transmit beam forming for fast cardiac
 
Novel modilo based aloha anti collision
Novel modilo based aloha anti collisionNovel modilo based aloha anti collision
Novel modilo based aloha anti collision
 
A new secure image transmission technique via secret
A new secure image transmission technique  via secretA new secure image transmission technique  via secret
A new secure image transmission technique via secret
 

Recently uploaded

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 

Recently uploaded (20)

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 

06542014

  • 1. 120 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 61, NO. 1, JANUARY 2014 Memory Footprint Reduction for Power-Efficient Realization of 2-D Finite Impulse Response Filters Basant K. Mohanty, Senior Member, IEEE, Pramod K. Meher, Senior Member, IEEE, Somaya Al-Maadeed, Senior Member, IEEE, and Abbes Amira, Senior Member, IEEE Abstract—We have analyzed memory footprint and combina- tional complexity to arrive at a systematic design strategy to derive area-delay-power-efficient architectures for two-dimensional (2-D) finite impulse response (FIR) filter. We have presented novel block- based structures for separable and non-separable filters with less memory footprint by memory sharing and memory-reuse along with appropriate scheduling of computations and design of storage architecture. The proposed structures involve times less storage per output (SPO), and nearly times less energy consumption per output (EPO) compared with the existing structures, where is the input block-size. They involve times more arithmetic resources than the best of the corresponding existing structures, and produce times more throughput with less memory band-width (MBW) than others. We have also proposed separate generic structures for separable and non-separable filter-banks, and a unified structure of filter-bank constituting symmetric and general filters. The pro- posed unified structure for 6 parallel filters involves nearly times more multipliers, times more adders, less registers than similar existing unified structure, and computes times more filter outputs per cycle with times less MBW than the existing design, where is FIR filter size in each dimension. ASIC synthesis result shows that for filter size (4 4), input-block size , and image-size (512 512), proposed block-based non-separable and generic non-separable structures, respectively, involve 5.95 times and 11.25 times less area-delay-product (ADP), and 5.81 times and 15.63 times less EPO than the corresponding existing structures. The proposed unified structure involves 4.64 times less ADP and 9.78 times less EPO than the corresponding existing structure. Index Terms—Block processing, 2-dimensional (2-D) finite im- pulse response (FIR), digital filters, VLSI architecture. I. INTRODUCTION TWO-DIMENSIONAL (2-D) digital filters are frequently used in 2-D signal processing as well as the image and video processing applications such as image enhancement, Manuscript received December 09, 2012; revised February 16, 2013; accepted March 31, 2013. Date of publication June 17, 2013; date of current version January 06, 2014. This work was supported by an internal grant from Qatar University (QUUG-CENG-DCS-11/12-7). The statements made herein are solely the responsibility of the authors. This paper was recommended by Associate Editor F. Clermidy. B. K. Mohanty with the Department of Electronics and Communication Engi- neering, Jaypee University of Engineering and Technology, Raghogarh, Guna, Madhy Pradesh 473226, India (e-mail: bk.mohanti@juet.ac.in). P. K. Meher is with Institute for Infocomm Research, 138632 Singapore (e-mail: pkmeher@i2r.a-star.edu.sg). S. Al-Maadeed with Department of Computer Science and Engineering, Qatar University, Doha, Qatar (e-mail: s_alali@qu.edu.qa). A. Amira with School of Computing, University of West Scotland, Paisley, Scotland, UK, (e-mail: abbes.amira@uws.ac.uk). Digital Object Identifier 10.1109/TCSI.2013.2265953 image restoration [1], template matching [2], face recognition, feature extraction for bio-metric systems [3]–[5], and video communication etc. The 2-D FIR filters are more popularly used compared to its infinite impulse response (IIR) counterpart due to their numerical stability and simplicity of design. The system function of 2-D FIR filter is often non-separable, while in a few cases, it is separable when impulse response is expressed as . The non-separable and separable system functions are, respectively, given as: (1) (2) where is the impulse response matrix of the non-sepa- rable 2-D FIR filter of size while and are the impulse responses of 1-D FIR filters used for row-wise and column-wise processing of 2-D input. Block diagrams of conventional realization of non-separable and separable 2-D FIR filters are shown in Fig. 1. As shown in this figure, both these filter structures consist of two types of hardware components: (i) the combinational component and (ii) the memory or storage component. The combinational compo- nent consists mainly of the arithmetic circuits along with some steering logic like multiplexors and demultiplexers, while the storage component consists of transposition buffers and/or shift- registers to provide appropriate data to combinational units. The non-separable structure uses shift-registers to introduce the nec- essary row-delays for the processing of intermediate data while the separable structure uses shift-registers for transposition of intermediate data. We can find from (1) that, a non-separable 2-D FIR filter of size involves shift-regis- ters (SRs) of size each, registers (for row-column processing), and multipliers and adders to compute one filter output per cycle. Similarly, we can find from (2) that, the separable 2-D filter of size involves SRs of words each, registers, multipliers and adders to compute one filter output per cycle. Combinational and memory (register) complexities of full-parallel non-separable and sepa- rable filters are given in Table I. Since, the image size is higher than the filter-size by more than an order of mag- nitude in most of the image-processing applications, memory becomes the dominant component of hardware complexity of 2-D FIR structures, and consumes major amount of chip-area and total power consumption. 1549-8328 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
  • 2. MOHANTY et al.: MEMORY FOOTPRINT REDUCTION FOR POWER-EFFICIENT REALIZATION 121 Fig. 1. Conventional structure of 2-D FIR filter. (a) Non-separable method, (b) Separable method. TABLE I COMBINATIONAL AND MEMORY COMPLEXITY OF FULL-PARALLEL SEPARABLE AND NON-SEPARABLE 2-D FIR FILTERS Some systolic architectures have been suggested for VLSI implementation of 2-D FIR filters to achieve high-throughput and low-latency implementation [6]–[11]. Recently, some effi- cient structures have been proposed for 2-D IIR filters [13]–[17] and shown that non-separable 2-D FIR filters can also be real- ized efficiently using those structures. In all these existing de- signs, systolization [18] of the structure is considered as the major issue, and a substantially large number of delay elements are placed in the data-path to avoid global communication. Sim- ilarly, the symmetry of impulse response matrix has been ex- ploited to reduce the hardware and time complexities of the structures [12]–[17]. In the last four decades, several design ap- proaches have been suggested for reducing the arithmetic com- plexity of one-dimensional (1-D) FIR filters [19]–[28]. All those methods are now quite mature, and can be applied to reduce the complexity of 1-D filters, which could be used for the imple- mentation of 2-D FIR filters as well. On the other hand, memory complexity constitutes the dominant part of overall area com- plexity of these structures, and plays a significant role in power- efficient realization of 2-D FIR filters. Keeping that in view, in this paper, we present memory-centric designs of 2-D FIR filter for the reduction of total memory usage, memory-reuse, reduc- tion of memory band-width and memory-sharing, which could lead to an area-delay-power-efficient structures. Many image processing applications use 2-D filter banks comprised of both separable and non-separable filters. One such application is biometric system where Gabor filter bank is used for feature extraction for face recognition and finger-print matching [3]–[5]. The Gabor filter bank is generated for different center frequencies and orientations. Consequently, constituent filters are both separable and non-separable types with symmetry as well as without symmetry. Recently, a unified structure has been proposed in [17] for the realization of filters with diagonal, four-fold rotational, quadrant and octal symmetries. However, this structure does not favour the realization a filter-bank since only one filter can be realized at a time. Moreover, separable filters can not be realized using this structure. The main objective of unified structure of [17] is to achieve saving of arithmetic resources, but in this paper, we aim at presenting a unified memory-efficient structure for filter bank with separable and non-separable 2-D FIR filters. The main contributions of this paper are: • Analysis of memory footprint and exploration of possible storage optimization to have a memory-efficient design strategy for the implementation of 2-D FIR filter. • Shared memory design for separable and non-separable structures. • Scheduling of computations and architecture design for memory-reuse and memory band-width reduction in sepa- rable filters. • Unified structure of filter bank comprised of separable and non-separable filters with low memory footprint per output. The rest of the paper is organized as follows: The proposed design strategy is discussed in Section II and block formulation of 2-D FIR filter is given in Section III. Proposed structures are presented in Section IV for separable and non-separable filters. Generic structure of filter-bank consisting of different types of filters is presented in Section V. Hardware complexity and per- formance of the proposed structures are discussed in Section VI. Conclusion is presented in Section VII. II. PROPOSED DESIGN STRATEGY To arrive at the proposed design strategy we analyze here memory complexities of possible configurations of 2-D FIR filter. The system function of non-separable 2-D FIR filter (given by (1)) can be written in a split form as: (3a) (3b) Computations of (3a) and (3b) can be performed by a di- rect-form or a transposed-form structure to have four possible configurations such as fully-direct (direct-direct), fully-trans- posed (transpose-transpose), hybrid-1 (direct-transpose), and hybrid-2 (transpose-direct), for the realization of non-sepa- rable as shown in Fig. 2 for . All these four structures require the same number of arithmetic components (multiplier and adders) and delay elements and cor- responding to shift-registers and fixed registers, respectively) except their locations in the data-path. Since, the bit-widths of arithmetic units, buses and registers in the data-path are different for input signals and intermediate signals, the overall memory requirements of different configurations are different in terms of number of storage bits, although all of them have the same number of delay elements1. We have estimated memory complexity of all the four type of structures for an input image size 512 512 (i.e ) and filter length and ), and listed in Table II for comparison. We find that, fully-direct structure has the lowest memory requirement than others. Interestingly, the memory complexity of fully-direct 1The input image to be processed by the 2-D FIR filter consists of 8-bit pixels while higher bit-width is required for intermediate signals.
  • 3. 122 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 61, NO. 1, JANUARY 2014 Fig. 2. Four different configurations for realization of 2-D FIR filter for . (a) Fully-direct structure, (b) Hybrid-1 structure (c) Hybrid-2 structure, and (d) Fully-transposed structure, where represents a shift-register of words and represents a single register. TABLE II MEMORY COMPLEXITY FULLY-DIRECT, FULLY-TRANSPOSE, HYBRID-1 AND HYBRID-2 STRUCTURES. : FILTER-SIZE, : INPUT IMAGE-SIZE, : INPUT SIGNAL BIT-WIDTH AND : INTERMEDIATE SIGNAL BIT-WIDTH structure is independent of word-length of intermediate signals since all the delay elements of this structure are placed on the input path only. This is a very useful feature to be exploited for memory footprint reduction in 2-D FIR filter structure. A. Exploration of Memory-Reuse Possibilities To explore the memory reuse possibilities, let us consider the input data-flow of fully-direct structure for the computa- tion of -th row of outputs as shown in Fig. 3 for . The samples required to compute a given filter output is shown in a pair of curly braces and the corresponding filter output is shown at its right. Each arrow shows the source (shift-register/register) of samples. To compute each output of (4 4) filter, 16 input-sam- ples corresponding to 4 rows and 4 columns of 2-D input are required. Out of 4 rows (numbered as ), the -th row is the current input-row and others are immediate past rows. Memory-unit uses three shift-registers (SR-1, SR-2, SR-3) to buffer 3 past -th rows of input
  • 4. MOHANTY et al.: MEMORY FOOTPRINT REDUCTION FOR POWER-EFFICIENT REALIZATION 123 Fig. 3. Memory-unit data-flow of fully-direct structure for outputs , and . Redundant memory-values are shown by blue color boxes. samples. The required -th columns of sam- ples of a particular-row are provided by the serial-in parallel-out (SIPO) shift-register of words. As shown in Fig. 3, all the 16 input-samples required to compute the filter output are obtained from the memory-unit and current input. Therefore, memory used by fully-direct structure to compute each output is words, where SRs-unit provides words and fixed register-unit provides 12 words. Memory band-width (MBW) of the structure is 15 (read operations on SR-unit is 3, and 12 values are obtained from register-unit to compute an output). This can be generalized to find the number of memory words used by fully-direct structure to be and MBW to be . As shown in Fig. 3, in total 64 input samples are required to compute the outputs . These 64 samples belong to 4 rows and 7 columns of the input image. It can be noticed that out of 64 samples 28 sam- ples are different from each other, while redundancy exists in rest 36 samples. The redundant values corresponding to filter-output in the blue boxes in Fig. 3. These redundant-memory access could be avoided by parallel computation of filter outputs . In that case, four current input samples corresponding to four outputs need to be available in each cycle. Four input values of each of the past rows are retrieved from respective shift-registers in every cycle. To realize this, each of the shift-registers (of words) is required to be split into four equal parts of words. Therefore, 12 input values are obtained from the SR-unit in every cycle. Out of 28 unique samples, 4 are obtained as input samples of current cycle and 12 samples are obtained from the SR-unit. The remaining 12 samples are obtained from four SIPO shift-registers. The memory usage for parallel computation of 4 filter outputs is which is the same as the memory usage of the fully-direct structure for one output. MBW for parallel computation of four filter outputs is . The memory words required to compute a block of filter outputs per cycle of the 2-D FIR filter of size can therefore found to be and MBW can found to be . Interestingly, the storage space of fully-direct structure of 2-D FIR filter of size is the same as parallel compu- tation of filter outputs due to memory reuse during parallel computation. Only arithmetic resources increases proportion- ately with throughput rate. This is an important feature which can be utilized for area and power saving. The memory reuse efficiency (MRE) of block-based structure can be estimated as MRE [(total input words—actual memory usage)/ac- tual memory usage], where total input words is times the memory-usage of single-input single-output (SISO) structure. For block-based structure with block size . Therefore, MRE of block-based parallel structure increases proportionately with the block-size . Higher the MRE, lesser is the memory requirement per output. Consequently, the structure is more area-delay efficient compared with the SISO structure. MBW of block-based structure increases by a factor of instead of times, where is the input block-size. This is mainly due to the number of redundant samples . Higher the , better is the MBW reduction. We have estimated the memory band-width saving (MBS) using the formula MBS of SISO structure) MBW of block-based structure of block-size MBW of SISO structure). Therefore, we can have . Since, varies with and , MBS is higher for larger size filters with higher input block-size. We have estimated , MBS and MRE of parallel 2-D FIR structure for different filter sizes and input block-sizes . The estimated values are listed in Table III to quantify the scope of memory reuse. We can find from Table III that a block-based realization enhances memory-reuse and reduces memory-band width per output. Therefore, we have presented the block for- mulation and its subsequent implementation of 2-D FIR filters. B. Memory-Sharing in Generalized 2-D FIR Filter Structures The fully-direct non-separable structure as well as the con- ventional separable structure use shift-registers to store words each, but shift-registers of fully-direct non-separable structure stores input pixel values while those of separable structure stores intermediate values. Due to the difference in bit-width, a common shift-register unit cannot be shared by these two structures. A different design approach for separable filter is required where the shift-register unit stores the input
  • 5. 124 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 61, NO. 1, JANUARY 2014 TABLE III MEMORY REUSE EFFICIENCY AND MEMORY BAND-WIDTH SAVING FOR DIFFERENT SIZE FILTERS AND INPUT-BLOCK SIZE signal only. For this purpose, we have used an efficient decom- position scheme for 2-D separable filter in the following. The input-output relation of separable 2-D FIR filter, given by (2), can be written as: (4) Computation of (4) can be expressed in split form as: (5a) (5b) and can be expressed as inner-products of a pair of -point vectors and as (6a) (6b) and and are given by (7a) (7b) (7c) (7d) According to (5), input-vectors are fed to the row-filter in column overlapped order, and the row-filter generates inter- mediate values column-wise exactly in the same order as the Fig. 4. Separable 2-D FIR filter structure (for based on proposed decomposition scheme. column-filter consumes intermediate values. Consequently, transposition-unit in this case is comprised of fixed registers instead of shift-registers. Separable filter structure based on decomposition scheme of (5) is shown in Fig. 4 for . The input samples are fed as overlapped blocks for and , where successive blocks of a column are over- lapped by samples. The input-blocks are fed in row-serial order from shift-register unit comprising of shift-reg- isters of words each. We can find from Figs. 3 and 4 that shift-register unit of fully-direct non-separable structure and the separable structure based on this decomposition scheme has the same number of shift-registers and they are of the same size. Interestingly, the data-input and data-output formats of both these shift-register units are identical. This favours the pos- sible sharing of shift-register units of fully-direct non-separable structure and separable structure. This leads to a generalized structure for both non-separable and separable filters individ- ually or in parallel configuration. Data-flow of a shared shift-register unit is shown in Fig. 5 for , where the input-data requirement of both fully-di- rect non-separable and separable structures is taken care of by the shared shift-register unit. A shared shift-register unit not only offers memory-saving, but also allows parallel realization of both non-separable and separable filters. It is shown in the later Sections that the parallel implementation of generalized structure offers higher area-delay-power efficiency over sequen- tial structure. Keeping these facts in view, we have outlined here a systematic memory-centric design strategy to derive an area-delay-power efficient structure for 2-D FIR filter. • A fully-direct form structure should be used for non-sepa- rable filter to have less storage-complexity. • A block implementation of fully-direct structure should be used for MBW reduction. • Separable structure based on the proposed decomposition algorithm could be derived for shift-register sharing with non-separable filter structure. • Appropriate algorithm partitioning and scheduling scheme need to be used for separable filter to minimize memory bandwidth and increase register sharing. The register sharing property of proposed non-separable and separable filter structures are exploited further to derive generic structures. The proposed generic structures can be configured
  • 6. MOHANTY et al.: MEMORY FOOTPRINT REDUCTION FOR POWER-EFFICIENT REALIZATION 125 Fig. 5. Data-flow in a shared shift-register unit of fully-direct structure and separable structure. for realization of a single filter of types (separable, non-sepa- rable, symmetry (diagonal, four-fold rotational, quadrant)) or parallel realization of any combination of these filters. III. BLOCK FORMULATION OF 2-D FIR FILTERS A. For Non-Separable Filter Let us consider a non-separable filter which processes a block of input samples and generates a block of outputs in every cycle. The -th block of filter output of the -th row is computed by relation; (8) where and are defined as (9a) (9b) The intermediate vector is computed by product of input- matrix and impulse-response vector , and given by (10) The input matrix is derived from -th row of the input matrix of size and given by (11), shown at the bottom of the page. is given by: (12a) From (9) and (11), we find is the inner-product of -th row of matrix and , given by (13) B. For Separable 2-D FIR Filter Let us consider a separable filter which processes a block of input samples and generates a block of outputs in every cycle. The -th block of filter output of the -th row is computed in this case by two successive matrix-vector products given by (14a) (14b) and the input-matrix and intermediate-matrix are given by (15) and (16), shown at the bottom of the page. (11) (15)
  • 7. 126 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 61, NO. 1, JANUARY 2014 Fig. 6. Proposed block-based structure for non-separable 2-D FIR filter for block-size and filter-size . IV. PROPOSED STRUCTURES In this Section, we have derived two separate structures for block implementation of non-separable and separable 2-D FIR filters. A. Block-Based Structure for Non-Separable 2-D FIR Filter The computation of (8) and (10) are mapped into a fully- direct parallel structure to derive the proposed block-based structure for non-separable 2-D FIR filter. The proposed struc- ture is shown in Fig. 6 for filter-size and block-size . It consists of one memory-module and one arithmetic module. 1) Memory-Module Design: The memory-module of Fig. 6 is comprised of one shift-register array and input-reg- ister units (IRUs). The shift-register array further consists of shift-registers of words each, where . Proposed structure receives a block of input samples and computes a block of outputs in each cycle. All the samples of each input-block belong to the same row and the inputs are fed to the structure block-by-block and then row-by-row in serial order. The input-block corresponding to the -th row of input image is fed to the structure during -th cycle of -th set of cycles, and the entire image is fed in cycles for . shift-registers of the shift-register array are arranged in groups referred to as SR-block and each SR-block has shift-registers. As shown in Fig. 6, SR-1,SR-2,SR-3,SR-4 constitute SR-block-1 and SR-25,SR-26,SR-27,SR-28 constitute SR-block-7. One SR-block stores one input row and the shift-register array Fig. 7. Internal structure of -th input register unit (IRU) for and . stores rows of input. SRs of SR-block are connected such that a block of samples transfer from one SR-block to the adjacent SR-block to its right after every cycle. Therefore, a block of inputs of a particular row are obtained from each SR-block in every cycle, and input-blocks corresponding to consecutive input rows are obtained from the shift-register unit in every cycle. Current input-block and past input-blocks avail- able in the shift-register array are sent to IRUs to generate input-matrix of size . During -th cycle of each set of cycles, the first IRU receives the current block of input , and the -th IRU receives an input-block from -th SR-block. The -th IRU generates the input-matrix , for . According to (11), a block of consecutive samples of -th row of input are required to construct the matrix . Each IRU receives samples from the corresponding SR-block during -th cycle and uses past samples belonging to past input-blocks. The in- ternal structure of -th IRU is shown in Fig. 7 for and . It consists of registers, and produces number of -point input-vectors , for cor- responding to rows of . 2) Arithmetic-Module Design: The arithmetic module is comprised of functional-units (FUs) and one adder tree (AT). In each cycle, FUs of arithmetic module receive sets of input-vectors from IRUs of storage-module such that -th FU receives input-vectors generated by -th IRU and it performs separate inner-product computation with the -th row of impulse-response matrix to obtain -point partial output-vector according to (10). Internal structure of the FU is shown in Fig. 8(a). It consists of (16)
  • 8. MOHANTY et al.: MEMORY FOOTPRINT REDUCTION FOR POWER-EFFICIENT REALIZATION 127 Fig. 8. (a) Structure of -th functional unit (FU) for and . (b) Structure of adder-block for and . Fig. 9. Internal structure of inner-product cell (IPC) for . inner-product cells (IPCs). Each IPC (shown in Fig. 9) per- forms -point inner product of input-vector and weight-vector. Finally, partial-output vectors of FUs are added together in an adder-block (shown in Fig. 8(b)) according to (8) to obtain one block of output in every cycle, where one clock period , and are, respectively, one multiplication time, addition time and one full-adder delay. One complete row of output is obtained in cycles and the entire output matrix of size in cycles. B. Block-Based Structure for Separable Filter The proposed block-based structure for separable 2-D FIR filter is shown in Fig. 10. It consists of one processing cell (PC) and one transposition-unit (TU), where a PC consists of two FUs. Structure of each FU is the same as the one shown in Fig. 8(a) except that a constant vector is stored in each FU. In this case, FU-1 and FU-2 store the constant-vectors and , respectively. It processes the -th block of input of the -th row of during the -th cycle of a set of cycles, and produces an output block , where and is the input block-size. One complete row of is pro- cessed in cycles and the complete image in cycles. Pro- Fig. 10. Proposed block-based structure for separable 2-D FIR filter for and . posed structure receives a block of input samples through number of -point input-vectors where each of the input-vectors is overlapped by samples. Input-vectors of -th and -th cycles of the -th set input cycles are shown in Fig. 10 for and . Components of each input-vectors are shown in the rectangular box adjacent to its left. The input-vectors of -th cycle form the matrix as given in (15), where is the -th row of , for , and . rows of are fed to IPCs of FU-1 in parallel to perform one matrix-vector multiplication with the constant vector to calculate an -point intermediate-vector according to (14a). From each intermediate-vector, one intermediate-matrix (as given in (15)) is generated, such that is generated from . TU generates the required rows of from . We can find from (11) and (16) that the elements of and satisfy similar property. Therefore, TU performs the same function as the IRUs and its structure is identical with that of an IRU (as shown in Fig. 7). The TU of separable structure gener- ates rows , for ) of in parallel and feeds those to FU-2 in parallel to perform one matrix-vector product with constant-vector to compute a block of filter output according to (16b). V. GENERIC STRUCTURES In this Section, we derive two separate generic structures for non-separable and separable filter banks. Also we have pro- posed a unified structure for realization of 2-D FIR filter-bank comprised of non-separable and separable filters. A. Generic Block-Based Structure for Non-Separable Filters The coefficient matrices of non-separable filters can have va- rieties of symmetry, e.g., diagonal, centro, four-fold rotational, quadrant etc. These symmetries can be exploited to realize the transfer functions with lesser number of multiplications. The arithmetic module of proposed structure for non-separable fil- ters could be optimized to take advantage of these symmetry property. The storage-module of the structure can be interfaced
  • 9. 128 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 61, NO. 1, JANUARY 2014 Fig. 11. Proposed generic block-based structure for non-separable 2-D FIR fil- ters, is the filter size and is the input block-size. Fig. 12. Proposed generic block-based structure for separable 2-D FIR filters, is the filter size and is the input block-size. as a common unit with arithmetic modules of filters with and without symmetry. This results in a generic structure shown in Fig. 11. Each sub-module of arithmetic-module of generic struc- ture represents arithmetic module of a constituent filter. The arithmetic-module of each filter is enabled with a select signal (EN , for ) to switch off the arithmetic module to have power saving if the output of a particular filter is not re- quired. The proposed generic structure can be used to realize any of the four types of filters or a parallel combination of fil- ters by selecting the arithmetic modules of respective filters through the select signals. The proposed generic structure has higher MRE and MBS than the non-separable structure for a given block-size and filter-size due to common storage-unit. The area-delay-product (ADP) and power consumed per output (PPO) of the proposed generic structure in parallel configuration is expected to be significantly less than separate implementation of individual filters. B. Generic Block-Based Structure for Separable Filters The proposed generic structure for the realization of sepa- rable filters with and without symmetry is shown in Fig. 12. The structure is similar to proposed generic non-separable structure except that the shift-register array is only common with PCs of the processing unit. Proposed generic structure can be used to realize a separable filter with and without symmetry in parallel. Fig. 13. Data-flow of common shift-register array and data distribution block for non-separable and separable generic structures. C. Unified Structure for 2-D FIR Filter-Bank The shift-register size and input to the shift-register array are the same in the proposed separable and non-separable generic structures. NL input samples are obtained from the shift-reg- ister array and fed to the IRU-array of generic non-separable structure as blocks of samples each, while the processing unit of the generic separable structure is fed with blocks of samples each. Therefore, the input-blocks of both non-separable and separable filters can be obtained from the same shift-reg- ister array. Outputs of the common shift-register array need to be rearranged appropriately for the non-separable and separable generic structures. Data rearrangement of a common shift-register array is shown in Fig. 13. The data-flow of an SR-block is shown in blue color. The input-blocks shifting through the SR-block are shown for the -th and -th cycle. The contents of the first two locations of each SR of the SR-block are shown for the -th and -th input cycles of the -th input row. The rectangular dotted boxes show the content of one cell of the SR-block comprised of 4 SRs corresponding to . A direct flow of data from the shift-register unit meet the data-flow requirement of non-separable generic structure while shift-reg- ister output data are rearranged (shown by the data-distribution block in Fig. 13) for the separable generic structure. The proposed unified structure for 2-D FIR filter is shown in Fig. 14. It has a common storage unit for both separable and non-separable sections which can be used for the realiza- tion of any one of four types of non-separable or two types of separable filters. It also can be used for parallel realization of any combination of separable or non-separable. It involves memory-words and computes outputs each of all the six filters in every cycle in its full-parallel con- figuration. Although we have shown the unified structure for 6 parallel filters, it can be used for realization of more than 6 filters without any additional storage. The processing module complexity (arithmetic-modules and PCs of each filter) only in- creases proportionately with the number of parallel filters. This is a major advantage for area-delay-power efficient realization
  • 10. MOHANTY et al.: MEMORY FOOTPRINT REDUCTION FOR POWER-EFFICIENT REALIZATION 129 Fig. 14. Proposed block-based unified structure for 2-D FIR filter. of large size filter banks consisting of separable and non-sepa- rable filters. VI. COMPLEXITIES AND PERFORMANCE CONSIDERATIONS The proposed structure for non-separable filter consists of a storage-module and an arithmetic module, while the proposed separable structure consists of one shift-register array and one processing cell, where the processing cell again consists of both arithmetic and storage components. The storage module of non- separable structure consists of one shift-register array and IRUs. The shift-register array of both non-separable and sepa- rable structures consists of SRs of words each. Each IRU consists of registers. The arithmetic module of non-separable structure consists of FUs and one adder-block while the processing cell of separable structure consists of two FUs and one TU (same as the IRU). Each FU consists of IPCs, and each IPC consists of multipliers and one adder-tree (AT) to add words. The adder-block consists of such ATs. A. Complexity of Block-Based Structures for Separable and Non-Separable Filters The arithmetic-module of block-based non-separable struc- ture involves multipliers, adders while each processing cell of separable structure involves multipliers, adders and registers. Apart from these, the non-separable structure involves registers and the separable structure involves regis- ters. Both these structures compute outputs per cycle, where one cycle period is and , re- spectively for non-separable and separable structure, for and are, respectively, one multiplier delay, adder delay and full-adder delay2. 2The delay of AT increases by after each level of the tree. Since, we have considered a (4 4) 2-D FIR filter to synthesize the proposed design, it requires an AT of two levels. So the delay introduced by the adder tree becomes . Therefore, the critical path is not affected much. For large order filter one can use a pipeline adder-tree (PAT) instead. For simplicity if we can consider the multiplier consisting of ripple carry adders, the multiplication time is twice that of an addition time . To maintain a critical path of one , then we need to introduce a pipeline stage after the first level of adder tree using registers. Upto a filter of order 511 we do not require any extra pipeline stage for word length 8 bit or more. Fig. 15. (a) Memory reuse efficiency (MRE). (b) Memory band-width per output (MBWPO). NS, S, NSG, SG and UNI, respectively, stands for non-sep- arable, separable, non-separable generic, separable generic and unified. TABLE IV GENERAL COMPARISON OF MRE AND MBWPO OF PROPOSED STRUCTURES. : INPUT-IMAGE WIDHTH/HEAIGHT, : FILTER-SIZE, : INPUT-BLOCK LENGTH B. Complexity of Generic Structures Arithmetic module of non-separable generic structure com- prises four sub-modules corresponding to four types of filters where each sub-module represents arithmetic-module of the corresponding filter. Sub-modules of symmetric filters involves the same number of adders as those of sub-module of general filter which is , and only differ by the number of mul- tipliers. Sub-module of diagonal, four-fold and quadrant sym- metry filters, respectively, involves and multipliers. Arithmetic-module of non-separable generic structure, therefore, involves multi- pliers and adders. Processing unit of proposed separable generic structure is comprised of two processing cells (one general and one symmetric filter). It involves
  • 11. 130 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 61, NO. 1, JANUARY 2014 TABLE V COMPARISON OF HARDWARE- AND TIME-COMPLEXITY OF THE PROPOSED STRUCTURES AND EXISTING STRUCTURES TABLE VI COMPARISON OF HARDWARE- AND TIME-COMPLEXITY OF PROPOSED NON-SEPARABLE GENERIC AND UNIFIED STRUCTURES AND EXISTING UNIFIED STRUCTURE OF [17] Fig. 16. Normalized storage per filter output (calculated for and , NS, S, NSG, SG and UNI, respectively, stands for non-separable, separable, non-separable generic, separable generic and unified. multipliers, adders and registers. The proposed non-separable generic structure, therefore, involves multipliers and adders and registers. It computes outputs of each of the four filters in every cycle. Similarly, the proposed separable generic structure involves multipliers, adders and registers, and computes outputs of each pair of filters in every cycle. The proposed unified structure has one storage unit and one processing module which is comprised of one non-separable section and one separable section. The non-separable section represent arithmetic-module of proposed non-separable generic structure whereas the separable section represents one processing unit of separable generic structure. Complexity of storage unit is the same as those of proposed block-based non-separable structure. The proposed unified structure, therefore, involves multipliers, TABLE VII HARDWARE AND TIME COMPLEXITIES OF PROPOSED GENERIC AND UNIFIED STRUCTURES AND EXISTING UNIFIED STRUCTURE OF [17] FOR INPUT-IMAGE SIZE , FILTER-SIZE adders and registers, and computes filter outputs of each of six filters (four non-separable and two separable) in every cycle. C. Memory Reuse Efficiency and Bandwidth Requirement Using the definition of MRE and MBWPO given in Section II, we have calculated MRE and MBWPO of pro- posed structures according to expressions given in Table IV. Using these expressions, we have estimated MRE and MBWPO of the proposed structure for filter-size and image-size . The estimated values are shown in graphs of Fig. 15. The MRE of proposed structures increases with block-size. MRE of unified structure is higher than others for a given block-size. This is mainly due to memory-sharing by more filters in the unified structure. D. Performance Comparison In [16], an efficient 2-D IIR filter (pole-zero) structure is presented which can be modified for realization of 2-D FIR
  • 12. MOHANTY et al.: MEMORY FOOTPRINT REDUCTION FOR POWER-EFFICIENT REALIZATION 131 TABLE VIII COMPARISON OF POST-LAYOUT SYNTHESIS RESULTS OF THE PROPOSED STRUCTURES AND THE STRUCTURES OF [16], [17] AND [29] FOR AND , USING TSMC 90 NM CMOS TECHNOLOGY LIBRARY, POWER ESTIMATED AT 20 MHZ FREQUENCY filter by removing the all-pole structure from the pole-zero structure. Accordingly, we have extracted an FIR filter struc- ture from [16] and synthesized that for comparison. Hardware complexity of these extracted structure along with those of proposed structures and structure of [11] and [29] are listed in Table V for comparison of complexities. We find from Table V that proposed non-separable structure involves times more multipliers and adders than the existing structures and equal number of registers, but it offers times higher throughput and times less MBWPO than others. Similarly, proposed sepa- rable structure involves times more multipliers and adders than those of [29] and the same number of registers, but offers times higher throughput and nearly 2 times less MBWPO than those of [29]. Proposed separable generic structure involves times more multipliers and times more adders than those of [29] and more registers, and offers times higher throughput with nearly 4 times less MBWPO than those of [29]. We have extracted a unified 2-D FIR filter structure from the multimodal structure of [17] for comparison purpose. The hardware and time complexities of this structure and the pro- posed non-separable generic and unified structure are listed in Table VI. Compared with structure of [17], proposed non-sep- arable generic structure and unified structure involve nearly times more multiplier times more adders and compute and times more outputs, respectively. Proposed generic structure involves less registers than that of [17] and it has nearly times less MBWPO than other. Similarly, the proposed unified structure involves less registers and nearly times less MBWPO. We have estimated hardware complexity of proposed non- separable generic and unified structures and the unified struc- ture of [17] for , block-size , 8 and for image-size . The estimated values are listed in Table VII. We can find from Table VII that, proposed non-separable generic structure involves 13.6% less normalized multipliers, 34.7% less normalized adders and 90% less MBWPO than those of [17]. Proposed unified structure involves 24.27% less normal- ized multipliers, 47.82% less normalized adders and 94% less MBWPO than those of [17]. Unlike the existing unified struc- ture, the proposed one does not require any steering logics for signal switching. Note that, the proposed non-separable generic structure is comprised of one general filter and three symmetric filters, while the proposed unified structure is comprised of one general non-separable and one general separable filter, and four symmetric filters. But, unified structure of [17] is comprised of only four symmetric filters. Proposed structures have several advantage over the existing structures, (i) provide filter-banks for non-separable and/or separable filters with symmetry and/or without symmetry which could be used in many image pro- cessing applications, (ii) can be configured for implementation of any one filter of the filter-bank or parallel configuration of any of the filters of the filter-bank, and (iii) can be easily scaled for high-throughput. We have estimated normalized storage per filter output (SPO) of proposed structures and the existing structure of [11], [16], [17], [29] for and and shown in the graph of Fig. 16. We find that proposed structures involves less normalized SPO than the existing structure. This is mainly due to memory-reuse efficiency and memory-sharing of the pro- posed structures. E. ASIC Synthesis Result We have coded the proposed designs in VHDL for filter size , block-size and image-size (512 512), and synthesized using Synopsys tool. We have also synthesized non-separable 2-D FIR filter structures extracted from IIR structure of [16] and unified structure of [17] and separable structure of [29]. We have used Wallace-tree based generic Booth-multiplier of Synopsys DesignWare building blocks library for all the designs. Shift-registers and registers of all designs are synthesized using D-FF (flip-flop). We have consid- ered input signal width -bit and all intermediate signals and output signal width -bit. We have set switching
  • 13. 132 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, VOL. 61, NO. 1, JANUARY 2014 activity toggle rate 0.25 and static probability is 0.5. The netlist file obtained from the Synopsys Design Compiler is processed in IC Compiler. After place, route and clock synthesis area, time and power (leakage and dynamic) reported by the IC Compiler are listed in Table VIII for comparison. Power consumption of all the synthesized designs are estimated for 20 MHz clock. Due to memory footprint reduction, the area, leakage power and dynamic power consumption of the proposed structures do not increase proportionately with the number of outputs. Consequently, proposed structures involve significantly less area-delay-product (ADP3) and consume less energy per output (EPO4). Compared with [16], proposed generic non-separable structure involves 11.25 times less ADP and 15.63 times less EPO. The proposed generic separable structure involves 4.85 times less ADP and 5.53 times less EPO than those of [29]. Compared with [17], proposed unified structure involves 4.64 times less area and 9.78 times less EPO. VII. CONCLUSION We have analyzed memory footprint and combinational complexity of 2-D FIR structures to arrive at a systematic design strategy to derive area-delay-power-efficient architec- tures. Based on that we have presented novel block-based separable and non-separable structures, generic structures for separable and non-separable filter-banks and unified structure for concurrent realization of both separable and non-separable filter-banks. It is shown that storage requirement of proposed structures does not change with input block-size . Similarly, the storage size of generic non-separable structure is indepen- dent of the number of parallel filters in a filter-bank. It increases marginally by words in case of generic separable and unified structures, where is the filter size. Proposed structures, therefore, offer higher memory reuse efficiency and MBW reduction for higher values of and . This reduces SPO and PPO of proposed structures. Compared with the existing structures, proposed block-based separable and non-separable structures involve the same number of storage words, and proportionately the same or less number of arithmetic resources than the corresponding best of existing structures, and compute PL times more filter outputs per cycle with PL times less MBW. The proposed unified structure with 4 non-separable filters and 2 separable filters involve nearly times more multipliers, times more adders, less registers than existing unified structure, and computes times more filter outputs per cycle with times less MBW than the other. ASIC synthesis result for filter-size (4 4), input-block size and image-size (512 512) shows that proposed block-based non-separable and generic non-separable structures, respectively, involve 5.95 times and 11.25 times less ADP, and 5.81 times and 15.63 times less EPO than the corresponding existing structures. The proposed unified structure involves 4.64 times less ADP and 9.78 times less EPO than the corresponding existing structure. When the inner-product computation involved in FIR filtering 3ADP area data-arrival time(DAT)/outputs per cycle 4power clock-period/number of outputs per cycle is realized by multiplier-less designs to reduce the combina- tional-complexity that may require additional memory leading to overall increase in memory complexity. But, the advantage gained by the proposed memory footprint reduction technique is not affected by such change of method of implementation of inner-products. REFERENCES [1] D. E. Dudgeon and R. M. Mersereau, Multidimensional Digital Signal Processing. Englewood Cliffs, NJ, USA: Prentice-Hall, 1984. [2] M. A. Sid-Ahmed, Image Processing: Theory, Algorithms, and Archi- tectures. New York, NY, USA: McGraw-Hill, 1995. [3] T. Barbu, “Gabor filter based face recognition technique,” in Proc. Rmanian Acad. Ser. A, 2010, vol. 11, no. 3/2010, pp. 277–283. [4] S. E. Grigorescu, N. Petkov, and P. Kruizinga, “Comparison of texture features based on Gabor filters,” IEEE Trans. Image Process., vol. 11, no. 10, pp. 1160–1167, Oct. 2002. [5] W. Li, K. Mao, H. Zhang, and T. Chai, “Designing compact Gabor filter banks for efficient texture feature extraction,” in Proc. 11th Int. Conf. Contr., Automation, Robotics Vis., Singapore, Dec. 7–10, 2010, pp. 1193–1197. [6] M. A. Sid-Ahemed, “A systolic realization for 2-D digital filters,” IEEE Trans. Acoustic Speech Signal Process., vol. 37, pp. 560–565, Apr. 1989. [7] N. R. Sanbhag, “An improved systolic architecture for 2-D digital fil- ters,” IEEE Trans. Signal Process., vol. 39, pp. 1195–1202, May 1991. [8] B. K. Mohanty and P. K. Meher, “Cost-effective novel flexible cell-vlevel systolic architecture for high-throughput implementation of 2-D FIR filters,” Proc. IET Comput. Digit. Tech., vol. 143, no. 6, pp. 436–439, Nov. 1996. [9] B. K. Mohanty and P. K. Meher, “High-throughput and low-latency implementation of bit-level systolic architecture for 1-D and 2-D digital filters,” IET Proc. Computer Digital Technique, vol. 146, no. 2, pp. 91–99, Mar. 1999. [10] L. D. Van, C. C. Tang, S. Tenqchen, and W. S. Feng, “New VLSI ar- chitecture without global broadcast for 2-D systolic digital filters,” in Proc. IEEE Int. Symp. Circuis Syst. ISCAS, Geneva, Switzerland, May 2000, vol. 1, pp. 547–550. [11] L. D. Van, “A new 2-D systolic digital filters architecture without global broadcast,” IEEE Trans. Very Large Scale Integr. Syst., vol. 10, no. 4, pp. 477–486, Aug. 2002. [12] H. C. Reddy, I. H. Khoo, and P. K. Rajan, “2-D symmetry: Theory and filter design applications,” IEEE Circuits System Mag., vol. 3, no. 3, pp. 4–33, 3rd Q 2003. [13] P. Y. Chen, L. D. Van, H. C. Reddy, and C. T. Lin, “A new VLSI 2-D diagonal-symmetry filter architecture design,” in Proc. IEEE APCCAS, Macao, China, Nov. 2008, pp. 320–323. [14] I. H. Khoo, H. C. Reddy, L. D. Van, and C. T. Lin, “2-D digital filter ar- chitectures without global broadcast and some symmetry applications,” in Proc. IEEE Int. Symp. Circuis Syst. ISCAS, May 2009, pp. 952–955. [15] P. Y. Chen, L. D. Van, H. C. Reddy, and C. T. Lin, “A new VLSI 2-D fourfold-rotational-symmetry filter architecture design,” in Proc. IEEE Int. Symp. Circuis Syst. (ISCAS), May 2009, pp. 93–96. [16] I. H. Khoo, H. C. Reddy, L. D. Van, and C. T. Lin, “Generalized formu- lation of 2-D filter structures without global broadcast for VLSI imple- mentation,” in Proc., IEEE MWSCAS, Seattle, WA, USA, Aug. 2010, pp. 426–529. [17] P. Y. Chen, L. D. Van, I. H. Khoo, H. C. Reddy, and C. T. Lin, “Power- efficient cost-effective 2-D symmetry filter architecture,” IEEE Trans. Circuit Syst. I, Reg. Papers, vol. 58, no. 1, pp. 112–125, Jan. 2011. [18] K. K. Parhi, VLSI Digital Signal Processing. New York, NY, USA: Wiley, 1998. [19] C. Y. Yao, H. H. Chen, C. J. Chien, and C. T. Hsu, “A novel common- subexpression elimination method for synthesizing fixed-point FIR fil- ters,” IEEE Trans. Circuit Syst. I, Reg. Papers, vol. 51, no. 11, pp. 2215–2221, Nov. 2004. [20] Y. Voronenko and M. Puschel, “Multiplierless multiple constant mul- tiplication,” ACM Trans. Algorithms, vol. 3, no. 2, May 2007, Article 11. [21] A. P. Vinod and E. M.-K. Lai, “An efficient coefficient-partitioning algorithm for realizing low complexity digital filters,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 24, no. 12, pp. 1936–1946, Dec. 2005.
  • 14. MOHANTY et al.: MEMORY FOOTPRINT REDUCTION FOR POWER-EFFICIENT REALIZATION 133 [22] R. Mahesh and A. P. Vinod, “A new common subexpression elimina- tion algorithm for realizing low complexity higher order digital filters,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 27, no. 2, pp. 217–219, Feb. 2008. [23] J. Luis, T.-X. , R. M. A.-P. , and M. Bayoumi, “Hybrid multiplierless FIR filter architecture based on NEDA,” in Proc. IFIP Int. Conf. Very Large Scale Integration (VLSI-SoC 2007), 2007, pp. 316–319. [24] P. K. Meher, S. Chandrasekaran, and A. Amira, “FPGA realization of FIR filters by efficient and flexible systolization using distributed arith- metic,” IEEE Trans. Signal Process., vol. 56, no. 7, pp. 3009–3017, Jul. 2008. [25] P. K. Meher, “New approach to look-up table design and memory- based realization of FIR digital filters,” IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 57, no. 3, pp. 592–603, Mar. 2010. [26] A. Croisier, D. J. Esteban, M. E. Levilion, and V. Rizo, “Digital Filter for PCM Encoded Signals,” U.S. Patent 3777130, Apr. 12, 1973. [27] S. A. White, “Applications of the distributed arithmetic to digital signal processing: A tutorial review,” IEEE ASSP Mag., vol. 6, no. 3, pp. 5–19, July 1989. [28] R. I. Hartley, “Subexpression sharing in filters using canonic signed digit multipliers,” IEEE Trans. Circuits Syst. II, Analog Digital Signal Process., vol. 43, no. 10, pp. 677–688, Oct. 1996. [29] B. K. Mohanty and P. K. Meher, “New scan method and pipeline archi- tecture for VLSI implementation of separable 2-D FIR filters without using transposition,” in Proc. IEEE Region 10 TENCON2008 Conf., Hyderabad, India, Nov. 2008. Basant K. Mohanty (M’06–SM’11) received the M.Sc. degree in physics from Sambalpur University, India, in 1989 and the Ph.D. degree in the field of VLSI for Digital Signal Processing from Berhampur University, Orissa, in 2000. In 2001 he joined as Lecturer in EEE Department, BITS Pilani, Rajasthan. Then he joined as an Assis- tant Professor in the Department of ECE, Mody Insti- tute of Education Research (Deemed University), Ra- jasthan. In 2003 he joined Jaypee University of En- gineering and Technology, Guna, Madhya Pradesh, where he became Associate Professor in 2005 and full Professor in 2007. His research interest includes design and implementation of low-power and high- performance systems for adaptive filters, image and video-processing appli- cations, secured communication and reconfigurable architectures. He has pub- lished nearly 50 technical papers. Dr. Mohanty is serving as Associate Editor for the Journal of Circuits, Sys- tems, and Signal Processing. He is a life time member of the Institution of Elec- tronics and Telecommunication Engineering, New Delhi, India, and he was the recipient of the Rashtriya Gaurav Award conferred by India International friend- ship Society, New Delhi for the year 2012. Pramod Kumar Meher (SM’03) received the M.Sc. degree in physics and the Ph.D. degree in science from Sambalpur University, India, in 1978, and 1996, respectively. Currently, he is a Senior Scientist with the Institute for Infocomm Research, Singapore. Previously, he was a Professor of Computer Applications with Utkal University, India, from 1997 to 2002, and a Reader in electronics with Berhampur University, India, from 1993 to 1997. His research interest includes design of dedicated and reconfigurable architectures for computation-intensive algorithms pertaining to signal, image and video processing, communication, bio-informatics and intelligent com- puting. He has contributed nearly 200 technical papers to various reputed journals and conference proceedings. Dr. Meher has served as a speaker for the Distinguished Lecturer Program (DLP) of IEEE Circuits Systems Society during 2011 and 2012 and Associate Editor of the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: EXPRESS BRIEFS during 2008 to 2011. Currently, he is serving as Associate Editor for the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS, the IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, and Journal of Circuits, Systems, and Signal Processing. Dr. Meher is a Fellow of the Institution of Electronics and Telecommunication Engineers, India. He was the recipient of the Samanta Chandrasekhar Award for excellence in re- search in engineering and technology for 1999. Somaya Al-Maadeed (SM’12) received the B.Sc. degree in computer science from Qatar University, Qatar, in 1994, the M.Sc. degree in mathematics and computer science from Alexandria University, Egypt, in 1999, and the Ph.D. in computer science from Nottingham University, Nottingham, U.K., in 2004. Following her Ph.D., she worked as an Assistant Professor with Qatar Uni- versity, where she did research in the areas of biometrics, digital filters, speech recognition, image processing, and document management. She has published around forty papers in the above general areas. Abbes Amira (M’01–SM’07) received the Ph.D. degree in the area of electronic and computer engineering from Queens University Belfast, U.K., in 2001, developing a coprocessor for matrix algo- rithms using FPGAs. He is a Full Professor in visual communication at the University of the West of Scotland (UWS), Scot- land. Prior to joining UWS, he took academic po- sitions at the university of Ulster-UK as Reader in embedded systems in the Nanotechnology and Inte- grated Bio-Engineering Centre (NIBEC); Associate Professor in Embedded Systems in the department of electrical engineering at Qatar University-Qatar, Senior lecturer at Brunel University-UK within the di- vision of Electronic and Computer Engineering and a lectureship in Computer Engineering at Queens University. He has been awarded a number of grants from government and industry and has published around 200 publications in the area of reconfigurable computing and image and signal processing during his career to date. Twelve Ph.D. students have successfully completed their Ph.D. under his supervision. He took consultancy positions with many companies in UK, and he holds two visiting professor positions at the University of Nancy, Henri Poincare, France and University of Tunn Hussein Onn, Malaysia. His re- search interests include: embedded systems, high performance reconfigurable computing, image and video processing, multi-resolution analysis, biometrics and connected health applications. Prof. Amira has been invited to give talks, short courses and tutorials at uni- versities and international conferences and being chair, program committee for a number of conferences. He was one of the tutorials presenters at ICIP 2009, Conference Chair of ECVW 2011, Program Chair of ECVW2010, Program Co-Chair of ICM12, DELTA 2008, IMVIP 2005. He is also one of the 2008 VARIAN prize recipients. He has been an external examiner for many Universi- ties in UK, Hong Kong, Australia and Malaysia. He was one of the guest editors for the Special Issue in the Pattern Recognition Journal, titled Feature Gener- ation and Machine Learning for Robust Multimodal Biometrics, March 2008. He is a Fellow of IET, Fellow of the Higher Education Academy, and Senior member of ACM.