SlideShare a Scribd company logo
1 of 7
Download to read offline
VLSI Architecture Design for Particle Filtering in
Real-time
A. Pasciaroni∗†, J. A. Rodr´ıguez†, F. Masson∗†, P. Juli´an∗†, E. Nebot‡
∗Dep. Ing. El´ectrica y Computadoras, Universidad Nacional del Sur
Av. Alem 1253, Bah´ıa Blanca, Argentina
†CONICET, Argentina
‡Australian Centre for Field Robotics, University of Sydney, Australia
Abstract—Particle Filter is an algorithm that provides system
state estimation even for non-linear and non-gaussian systems.
For applications that require a large number of particles, real
time constraint is hard to accomplish since the algorithm is
computationally expensive and the resampling step becomes a
bottleneck. In this work, a VLSI architecture for particle filtering
in real time is presented. The proposed design implements a
fraction of the processing using piecewise linear functions and
allocates them as global resources. In this way, a large number of
processing elements (PE) working in parallel can be instantiated
in the design. An example based on a range-only localization
using Radio-Frecuency identification (RFID) tags is developed to
illustrate the approach. The received signal strength indicator
(RSSI) is used to estimate the distance between transmitter
and receiver. A VHDL RTL model of the processing data flow
is implemented and compared to Matlab simulations showing
similar results.
Index Terms—Particle Filter, VLSI Design, RFID, RTL.
I. INTRODUCTION
Particle Filters (PF) [1] are a method to perform statistical
dynamic state estimation. The probability density function of
a given state is represented by a set of weighted entities or
particles which is updated iteratively according to sensor mea-
surements and a dynamic system model. The three main steps
of the particle filter are: sampling, update and resampling.
This last step presents high data dependency between particles,
becoming the major bottleneck in the execution time of the
filter.
There exist applications that require real-time estimation of
non-linear and non-gaussian systems as robot localization and
visual tracking [2], [3], [4]. These applications are well suited
for particle filtering but a large number of particles is required
to provide accurate estimations. Since the PF algorithm is
computationally expensive and the resampling step cannot
be fully parallelized, particle filter computation in real time
is limited by the available computational resources. In this
context, a VLSI implementation that exploits algorithm data
level parallelism will allow particle filtering at real time.
Previous works have addressed particle filter implementa-
tions for real time applications [5] [6] [7] [8]. In [5] a PF
architecture composed of multiple processing elements and a
central unit for the bearing-only tracking problem is presented
and implemented in FPGA. Particle filter steps are performed
locally on each processing element (PE). After resampling, a
central unit controls the particle exchange among processors
in order to reduce performance degradation. Several commu-
nication schemes are introduced including a fixed particle
exchange among processors. In [7] a VLSI design of the
processing element is presented which also includes a pipeline
dataflow that deals with logic blocks of variable latency. In [8]
a central unit that performs communication schemes, intro-
duced in [5], for an architecture composed of four processing
elements is designed and a VLSI implementation is presented.
In [6] a parallel pipelined design is presented. The number of
replicated pipeline stages is variable. Taking into account the
rate of each stage an optimal number of replicated stages is
determined. However, a VLSI implementation that takes full
advantage of the data level parallelism present in the algorithm,
has not been developed yet.
In this work a VLSI architecture for particle filtering in real
time applications is presented. It is composed of processing
clusters with one resampling module and an array of PE. Each
PE performs several steps of the PF operation that do not
present data dependency, in a pipelined fashion. Therefore,
if more PE can be instantiated in a given Silicon area, more
particles can be effectively processed in parallel, increasing the
throughput. Afterwards, resampling modules gather PE outputs
so that the resampling is performed in groups. In addition, to
reduce the PE area, a fraction of the PE data processing is
time-multiplexed so hardware dedicated to this processing is
instantiated once and can be shared by multiple PE.
The application chosen to illustrate the approach is target
tracking based on Received Signal Strength Indicator (RSSI)
of Radio Frequency Identification devices (RFID).
The paper is organized as follows. Section II presents the
localization framework and RSSI sensor model. The archi-
tecture and microarchitecture design is presented in section
III. Execution time of proposed architecture is presented in
section IV. Simulation results comparing the VHDL RTL and
Matlab models are presented in section V. Finally, Section VI
is dedicated to the conclusions.
II. LOCALIZATION FRAMEWORK
In sensor networks, Radio Frequency based localization
systems have gained importance in those environments where
Global Positioning based system (GPS) do not perform well
due to poor satellite availability or multiple path issues [9]
[10]. This a possible situation for the choosen target appli-
cation: trucks localization in opencast mining enviroments
0 10 20 30 40 50 60
−150
−100
−50
0
50
Two Ray Model
Distance [m]
AverageSignalStrength[dBm]
Fig. 1: Two Ray Model for a communication link of 433 Mhz in a
rural enviroment.
[9]. The RFID technology comprises the receivers, antennas
and RFID tags. The tags send their identification number to
the receivers. Making use of RSSI it is possible to estimate
the distance between a tag and a receiver since RSSI values
decrease with distance with a known law. Due to several
factors that affect propagation of electromagnetic waves in
a medium (refractions, reflections, scattering), the received
power vs distance relation varies with the obstacles in the
environment, the height and direction of the antenna and also
the power of the signal transmitted. This results in a non-
biyective and thus multimodal sensor function.
Figure 1 shows a typical two-ray propagation model of RF
signals [11] for a rural environment and a communication
frequency of 433 MHz and transmitter and receiver height
of 2.5 m. It shows the average signal strength of the received
power versus distance. For a given distance the distribution
of RF signal is considered Gaussian and its variance varies
with the signal strength [9]. It is possible to observe that for
a received power of −70 dBm there exist multiple distance
values: 8 m, 15.5 m, 20 m and 43.1 m being one of those the
true value of the tag position. This example shows the multi-
modal probabilistic density function associated with RFID
sensor.
RSSI based localization can be performed using the particle
filter algorithm. Consider a hypothetical scenario of one RFID
tag moving in 2-D and one antenna located at the origin. Let
pi
k denote the ith particle, where pi
k = x ˙x y ˙y
′
. The
target system evolution is given by
f(pi
k−1, vx, vy) =




1 ∆T 0 0
0 1 0 0
0 0 1 ∆T
0 0 0 1



 · pi
k−1+




0.5 · ∆T 2
0
∆T 0
0 0.5 · ∆T 2
0 ∆T



 ·
vx
vy
,
(1)
where vx and vy are drawn from a uniform distribution
U[0, Q].
The pseudocode of the Particle Filter algorithm for the
chosen application and for a set of N particles is described
below:
random initialization of particles;
for i ← 1 to N do
pi
k = f(pi
k−1, vx, vy); //sampling
di
= sqrt(pi
k(1)
2
+ pi
k(3)
2
);
Poti
= Fsensor(di
);
wi
=
1√
2π·σ2
· exp(−(P oti
−P otmeasurement)2
2·σ2 ) //update
end
[ ˆw, ˆp] = resampling(w, pk);
where Potmeasurement is the power measurement of the
received signal whose variance is σ2
and Fsensor(d) is the
mathematical expression of the Two ray propagation model
whose characteristic is shown in Figure 1. Depending on
the obstacles present in the enviroment a more complex
sensor model can be utilized. For the resampling step there
exist several algorithms [12], [5], [13]. Position estimation is
computed by the following equation:
˜x =
N
i=1
ˆpi
· ˆwi
(2)
III. DESIGN
A. Architecture
The PF filter algorithm does not presents data dependence
between particles except for the resampling step. When the
number of particles increases the resampling execution time
can become a bottleneck. A strategy to reduce the resampling
execution time is to divide the total number of particles into
groups so parallelism level is increased [5]. Each particle
group is processed by a dedicated processor. Since the resam-
pling step is sequentially executed, there exists a trade off be-
tween the number of processors and the estimation error: as the
number of particle groups increases, so does the degradation of
the filter [14]. In order to reduce this performance degradation
a particle exchange must be performed among processors. In
[15] an optimization of the particles exchange procedure is
presented. A formal analysis, applying the Kullback-Leibler
divergence, proves that the exchange of particles with largest
weights between adjacent processors results in better accuracy
than a random particle mixing. In [14] this exchange is
performed after resampling thus the selection of particles
with largest weight is avoided. The analysis of algorithm
parallelization has been done in [14] allowing the selection
of an optimal configuration. Once one filter iteration has been
performed, the estimate of each processor is combined in order
to provide a global estimation [15].
The system consists of two modules: the measurement unit
and the processing unit. The system block diagram is shown
in Fig. 2-a. The measurement unit sets up the RSSI value and
computes the reciprocal of the noise variance. The processing
unit performs the PF algorithm and provides an estimated
position.
In order to process thousands of particles in real-time the
processing unit architecture must exploit data level parallelism
and at the same time take into account the strategy described
above. A parallelism level hierarchy is adopted. The first
level is performed by introducing multiple processing elements
(PEs) each one performing the PF algorithm steps that do not
present data dependency. The second level consists in gath-
ering PEs in clusters so data input for the resampling step is
made up of the processed particle and weight of each PE inside
a cluster. For the final estimation of position, the estimate of
each cluster is combined as was previously mentioned. Particle
exchange among clusters is also performed.
The proposed VLSI design implements most area consum-
ing operations in external (out of the array) Look-up tables
(LUT). These LUTs are taken away from the processing
element dataflow and put them as global resources. For each
table there is a Broadcast module that sequentially reads the
table and performs interpolation. The interpolated value and
interpolation address are broadcasted to all PEs through buses.
Each PE locally computes its required interpolation address
and compares it with the current value in the bus. If an
equivalence is found, the corresponding data value is acquired
by the PE.
Figure 2-b shows a more detailed architecture of the pro-
cessing unit. It has 4 clusters with 4 PEs each. Sensor measure-
ment and the reciprocal of its variance 1/σ2
are communicated
to all PEs. Four global resources are introduced: Square,
Sqrt, Sensor and Normal LUT. Each broadcast module has
two independent buses: interpolation/address and data/bus.
Resampling, pseudo-random number generator (PRNG) and
Word-to-memory modules inside each cluster are also in-
troduced. All modules are explained in further subsections.
Communication among clusters is not shown to simplify the
diagram.
Each cluster has its own local memory and works without
data dependence of others except when the particle exchange
is performed. Processing elements belonging to a cluster share
local memory.
Regarding control logic, each cluster has its own control
logic that manages main memory reading and writing and also
global control signals. Furthermore, each processing element
that integrates a cluster has a dataflow pipeline whose control
is distributed. Since each pipeline stage has a variable delay
dependent on the time instant when the corresponding value is
present in data bus, global pipeline control is not affordable.
Therefore, each stage has a local control logic dependent on
data events.
B. Cluster Operation
Architecture cluster operation proceeds as follows: while
in execution, each PE inside a cluster reads a particle from
memory. Each Broadcast module sequentially reads its corre-
sponding LUT, interpolates and broadcasts interpolated value
and interpolation address to all PEs. Since the PE dataflow
is pipelined, a single table read is utilised to process several
particles. Main memory has two ports so memory reading and
writing is performed simultaneously.
Two arrays, one made up of particles and another one
of processed weights from each PE are the input for the
Resampling module. Once the arrays have been totally up-
dated, resampling is performed. The elements of resampling
Fig. 2: a) Block Diagram of the VLSI architecture for proposed
tracking system, b) architecture of the processing unit.
arrays are processed sequentially. As soon as one element is
resampled, it is immediately updated by corresponding PE.
Once all data from local memory has been processed
communication among clusters is performed.
C. LUTs design
The functions implemented in the LUTs are: square, square
root, two ray propagation model (as shown in Fig. 1) and
normal distribution. All of them are evaluated with a piecewise
linear function with uniform segmentation. By performing
interpolation, a reduction in table size is achieved. At a point
x ∈ [a , b], a linear interpolation is calculated as follows:
˜f(x) =
f(b) − f(a)
b − a
· (x − a) + f(a) (3)
This operation is performed by the broadcast module shown
in Fig. 3. A counter generates 2N+M
words where the N most
significant bits are used for LUT addressing and the remaining
M bits for interpolation.
The introduced dataflow is composed of several tabulated
functions and interpolations in cascade. When the interpolated
value from a broadcast module is captured by the correspond-
ing pipeline stage, it becomes the interpolation address for the
next tabulated function. It is desirable to find an appropriate
word length for LUT addressing, interpolation and function
value quantization. This length should maximise the ratio
between interpolation address word length and interpolated
value word length. At the same time, the approximation errors
Fig. 3: Broadcast Module
TABLE I: Piecewise Linear Function Setup
F unctionN M Q R S Size
Kbits
Range X Interp.
Error
Square 9 2 14 17 - 7 [0,40] 5 · 10−4
Sqrt 10 2 11 13 5 11 [0,3200] 3 · 10−4
Sensor 10 2 10 12 1 10 [0,113] 4 · 10−4
Normal 9 1 10 11 3 5 [0,5] 5 · 10−3
should be reduced since they are propagated through the
dataflow. In this regard, the accuracy analysis introduced in
[16] for practical implementation of piecewise linear functions
is adopted. Table I shows the setup chosen for each piecewise
linear function implementation where N, M, Q are the number
of bits assigned for segmentation, interpolation and function
value quantization. R and S are output data resolution and
discarded input bits. The error introduced by each interpola-
tion, which is calculated as the median of the absolute error
over one thousand samples of evaluation interval, i.e.,
error(x) = mean(|
f(x) − finterp(x))
f(x)
|) (4)
is also included in the table.
The normal distribution implementation requires to evaluate
normal distributions with different values of variance. Any
normal distribution can be obtained from the standard normal
distribution. If a distribution with mean µ and variance σ2
must be evaluated for a value t, the following equations allow
the calculation using only the standard normal distribution
function:
z =
t − µ
σ
, (5)
pNormal =
1
σ
· pStandardNormal(z), (6)
where
PStandardNormal(z) =
1
√
2 · π
· exp(
−z2
2
). (7)
Moreover, as the function is symmetric around the mean,
there is only need to store half of the evaluation interval,
reducing even more the LUT size.
The architecture comprises dual port memories thus the two
values for interpolation can be obtained simultaneously.
Fig. 4: PE Micro-architecture
D. PE Micro Architecture
Each PE sequentially performs the two algorithm steps that
do not present data dependency: sampling and update. Pro-
cessing is divided into several modules in order to implement
a level module pipeline: Sampling, Acquisition Square Value,
Acquisition Sqrt Value, Acquisition Sensor Value and Acqui-
sition Normal Value. Figure 4 shows the pipelined dataflow
microarchitecture.
1) Sampling Unit: The sampling unit processes data from
main memory current location. Memory word datawidth is 48
bits where each particle component has 12 bits. Range for
position and velocity is [−40, 40] m and [−25, 25] m/s. This
unit performs a translation in the plane by using a simplified
version of the dynamic model detailed in (1). This simplifica-
tion allows a reduction in the number of multiplications. For
this design the dynamic model is fixed but future designs will
consider a programable model. The translated positions and
velocities are computed as follows
px(k) = px(k − 1) + vx(k − 1) · △T +
1
2
· nx (8)
py(k) = py(k − 1) + vy(k − 1) · △T +
1
2
· ny (9)
vx(k) = vx(k − 1) + nx (10)
vy(k) = vy(k − 1) + ny (11)
where nx and ny are drawn from a uniform distribution
U[0, W]. Depending on the value of the △T parameter, the
W value should be adjusted in order to provide similar
accelerations than the original model. The random noise is
generated by a linear feedback shift register [LFSR] [17]
with internal XORs of 16 bits with reconfigurable seed. This
pseudo random number generator is a shared resource inside
a cluster. Each PE takes a number at its corresponding turn.
The eight most significant bits are used fot the nx component
and the eight less significant bits for the ny component. Each
component noise is pre-multiplied by the variance value Q.
Either Q and △T registers are programmables of 8 bits length.
The output of the sampling unit has the same datawidht as
its input.
2) Acquisition Value Units: All acquisition units detect
when data input is equal to the current value in the interpola-
tion address bus. This detection is performed with a bitwise xor
operation. When an equivalence is detected the data present
in the data bus is acquired.
The Acquisition Square Value unit, performs the sum of the
inputs squared. When x or y are negative two’s complement is
performed. Thus |x| and |y| have 11-bit word length and are
compared to the interpolation address bus. Once the squared
value is captured for both components, sum is performed with
17 bits output data width. The broadcast module for the Sqrt
function provides a 12-bit interpolation address bus. Therefore
the 5 less significant bits of x2
+ y2
are discarded when
Acquisition Sqrt Value unit compares its data input with the
value present in the interpolation address bus. The same occurs
for the block Acquisition Sensor Value with the less significant
bit discarded from its input word.
The Acquisition Normal Value Unit generates a word using
(5) with µ equal to Potmeasurement. Once an equivalence is
detected, the data present in the bus is acquires. Finally it is
multiplied by the reciprocal of the standard deviation as stated
in (6). The reciprocal of the variance has 8-bit width as well as
the power measurement. In order to perform subtraction in (5),
the 5 less significant bits of the input word are discarded. The
word length after this equation is 16 bits. According to Table
I, the tabulated normal function requires 1 interpolation bit,
therefore the 6 less significant bits are not taken into account
resulting in a 10-bit word length. Once data value is captured
by the PE, it is multiplied by the reciprocal of the variance,
resulting in a 19-bit word lentgh.
E. Resampling unit
The resampling algorithm selected for implementation is the
modified Independent Metropolis Hasting (IMH) [12] which
substitutes division operation for comparison and particles
and their weights are sequentially processed. The algorithm
is summarized in the following pseudocode:
wprev = wk
1
;
for i ← 2 to NUMPART ICLES do
u ∼ U(0, 1);
if ( u · wprev > wk
i
) then
wprev = wprev; resample = 1;
else
wprev = wk
i
; resample = 0;
end
end
Algorithm 1: Implemented resampling algorithm
Figure 5 shows the architecture of resampling and word-to-
memory modules. The particle array is fulfilled with output
particles from sampling unit. Both arrays particle array
and weight array must be fully updated in order to initiate
resampling operation. First particle of the whole set is always
resampled. Subsequent particles will be stored in memory
depending on the comparison among their weight and wprev.
The random number generator is implemented with a LFSR
of 16 bits.
The resample signal controls the data stored in memory. If
value of resample is 1, the data present in the particle register
is written to memory else actual processed particle is selected
and wprev is updated.
In order to sinchronize the translated particle with the
pipeline time schedule, it must be delayed as many times as the
Fig. 5: Word-To-memory Architecture
number of pipeline stages in between the sampling unit and the
Acquisition normal value unit. Each PE reads a particle from
a memory location and, once the particle is resampled, word-
to-memory unit stores it at the same location. Since a dual
port memory is considered and the architecture is pipelined,
memory reading and writing is done simultaneously. Control
is achieved with a read address and write adress counter. The
former is dependent on control signals from sampling unit
and the latter is dependent on control signals from word-to-
memory unit.
IV. EXECUTION TIME
Since the execution time of each module is variable, each
PE will complete its processing at different times. The resam-
pling module begins operation when the first PE has finished
processing its particle. Figure 6 shows the execution time of
the dataflow for a cluster made up of two processing elements.
Pipeline delay between output data values is given by the
slowest stage. In the presented design this corresponds to
the stage with the largest interpolation bus address, since it
takes 2N+M
cycles in order to acquire the last interpolated
value. This is the case of the Sqrt function. In the worst case
execution time, a new particle is processed every 4096 cycles.
As resampling takes one cycle to process each particle, the
number of cycles to finish the resampling operation depends
on the number of PEs in a cluster. Therefore, the last element
of the resampling array will be updated every 2N+M
+ P,
where P is the number of PEs in the cluster.
V. RESULTS
A. Simulation Results
A VHDL RTL model of the processing element was de-
veloped. The implementation flow was the following: first a
fixed point Matlab implementation of the processing element
described above was generated and compared to its floating
point counterpart to prove its proper operation. Second, an
RTL model that matches the fixed point Matlab implemen-
tation was developed. At this stage of the implementation,
Fig. 6: Filter execution time.
0 2 4 6 8 10 12 14 16 18 20
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Distance [m]
Weight
Matlab Model vs RTL Model
Matlab Model
RTL Model
Fig. 7: Weights vs. distance.
the RSSI measurement, the reciprocal of the noise standard
deviation and the position estimation are generated off-line.
Figure 7 shows the distribution of weights vs distance for
the floating point Matlab implementation and the RTL model,
in the case where the measured power is −55.42 dBm and
σ = 2.5 dBm. Since RSSI measurements are 8-bit quantized,
the normal distribution is also quantized. It can be noticed that
the RTL model provides similar results to the floating point
Matlab implementation.
A 2-D tracking scenario can be simulated to show the
dynamic performance. In this case, the fixed point Matlab
implementation is used, instead of the RTL model, in order
to reduce simulation time. The scenario is composed of a unit
moving at nearly constant velocity and three fixed antennas
s1, s2 and s3 placed at positions: [0, 0], [−20, 0], [0, 20],
respectively. The position of the target unit evolves with time
according to (1).
The mobile initial state is x0 =
[−8m, 12m/s, 10m, −2m/s] and ∆T = 0.1s. The total
number of particles used is 4096, which are uniformly
distributed on a region delimited by the intervals [−20m, 20m]
and [0, π] radians at the beginning of the simulation. Particle
velocities have been randomly initialised with uniform
distribution in the interval [17, 7] for ˙x and [7, −3] for ˙y.
Figure 8 shows the trajectory of the target unit (green
line) and simulation results for the Matlab model and the
RTL model, in red and black lines, respectively. Both models
provide very close results.
−8 −6 −4 −2 0 2 4 6 8
0
5
10
15
20
Floating Point Matlab Model
Fixed Point Matlab Model
Tartet trajectory
Antenna
Fig. 8: Tracking of a moving target with three antennas.
TABLE II: Synthesis Results
Module Area [µm2]
Sampling 37453
Acq. Square Value 6144
Acq. Sqrt Value 2268
Acq. Sensor Value 1932
Acq. Normal Value 13293
Total Area EP 87086
B. Synthesis Results
The RTL model of the processing element described in
section III was synthesized using Synopsis DC Compiler and
0.13µm CMOS technology. Since the array is composed of
several processing elements it is desirable to have the area
required for this basic unit. Table II shows the area of the
processing element and its modules.
VI. CONCLUSIONS
A VLSI architecture for particle filtering in real time was
presented. This architecture exploits the data level parallelism
in the algorithm and also takes into account performance
degradation due to resampling parallelization. Introducing
global resources allows an increase in concurrent hardware.
Processing dataflow was described along with a piecewise lin-
ear function implementation. An RTL model of the proposed
design was generated. Simulation shows that the architecture
correctly implements the PF adapted to the specific applica-
tion. Further work is needed to choose an optimal number of
PEs per cluster.
VII. ACKNOWLEDGMENTS
The results of this paper were partially supported by PICT
2010-2657 3D Gigascale Integrated Circuits for Nonlinear
Computation, Filter and Fusion with Applications in Industrial
Field Robotics of Agencia Nacional de Promoci´on Cient´ıfica y
Tecnol´ogica (ANPCyT) of the Argentine Ministry of Science
and Technology (MINCYT).
REFERENCES
[1] N. Gordon, D. Salmond, and A. F. M. Smith, “Novel approach to
nonlinear/non-gaussian bayesian state estimation,” IEE Proc. Of Radar
and Signal Processing, vol. 140, no. 2, pp. 107–113, 1993.
[2] M. Isard and A. Blake, “Condensation - conditional density propagation
for visual tracking,” International Journal of Computer Vision, vol. 29,
no. 1, pp. 5–28, 1998.
[3] D. Fox, “Kld-sampling: Adaptive particle filters and mobile robot
localization,” in Advances in Neural Information Processing Systems
14, vol. 2, 2001, pp. 713–720.
[4] D. F. C Kwok and M. Meila, “Real-time particle filters,” Proceedings
of the IEEE, vol. 92, no. 3, pp. 469–484, Mar 2004.
[5] M. Bolic, P. M. Djuric, and S. Hong, “Resampling algorithms and
architectures for distributed particle filters,” IEEE Transactions on Signal
Processing, vol. 53, no. 7, pp. 2442–2450, July 2005.
[6] A. C. Sankaranarayanan, A. Srivastava, and R. Chellappa, “Algorithmic
and architectural optimizations for computationally efficient particle
filtering,” IEEE transcactions on Image Processing, vol. 17, no. 5, pp.
737–748, May 2008.
[7] S.-S. Chin and S. Hong, “Vlsi design of high-throughput processing
element for real-time particle filtering,” in Signals, Circuits and Systems,
vol. 2, 2003, pp. 617–620.
[8] S. Hong, S. S. Chin, M. Boli, and P. M. Djuric, “Design and implemen-
tation of flexible resampling mechanism for high-speed parallel particle
filters,” Journal of VLSI signal processing systems for signal, image and
video technology, vol. 44, pp. 47–62, 2006.
[9] G. Kloos, J. E. Guivant, E. M. Nebot, and F. Masson, “Range based
localisation using rf and the application to mining safety,” in Proceedings
of the 2006 IEEE/RSJ International Conference on Intelligent Robots
and Systems, Oct 2006, pp. 1304–1311.
[10] S. Sanudo and F. R. Masson, “Desempe˜no del filtro de part´ıculas acotado
en una aplicaci´on de localizaci´on y seguimiento de camiones en una
explotaci´on minera,” in XIV Reunion de Trabajo en Procesamiento de
la Informacion y Control, vol. 1, 2011, pp. 712–717.
[11] H. Xia, H. L. Bertoni, L. Maciel, A. Lindsay-Stewart, and R. Rowe,
“Radio propagation characteristics for line-of-sight microcellular and
personal communications,” IEEE Transactions on Antennas and Propa-
gation, vol. 41, no. 10, pp. 1439–1447, Oct 1993.
[12] L. Miao, J. J. Zhang, C. Chakrabarti, and A. Papandreou-Suppappola,
“Algorithm and parallel implementation of particle filtering and its use
in waveform-agile sensing.” Signal Processing Systems, vol. 65, no. 2,
pp. 211–227.
[13] M. Bolic, P. M. Djuric, and S. Hong, “Resampling algorithms for particle
filters: A computational complexity perspective,” EURASIP Journal on
Applied Signal Processing, vol. 15, pp. 2267–2277, 2004.
[14] A. Pasciaroni, S. Sanudo, J. Rodriguez, F. Masson, and P. Julian,
“Modelling and analysis of parallel particle filters,” in XV Reunion de
Trabajo en Procesamiento de la Informacion y Control, vol. 1, no. 1,
2013, pp. 1–6.
[15] B. Balasingam, M. Bolic, P. Djuric, and J. Miguez, “Efficient distributed
resampling for particle filters,” in IEEE Int. Conf. on Acoustics, Speech
and Signal Processing (ICASSP), 2011, pp. 3772–3775.
[16] O. Lischitz, P. Julian, J. Rodriguez, and O. Agamennoni, “Accuracy
analysis for an on-chip digital pwl realization,” in XIV Reunion de
Trabajo en Procesamiento de la Informacion y Control, 2011, pp. 429–
434.
[17] Z. Barzilai, D. Coppersmith, and A. L. Rosenberg, “Exhaustive gen-
eration of bit patterns with applications to vlsi self-testing,” IEEE
Transactions on Computers, vol. C-32, no. 2, pp. 190–194, Feb 1983.

More Related Content

What's hot

M.E Computer Science Remote Sensing Projects
M.E Computer Science Remote Sensing ProjectsM.E Computer Science Remote Sensing Projects
M.E Computer Science Remote Sensing ProjectsVijay Karan
 
M.Phil Computer Science Remote Sensing Projects
M.Phil Computer Science Remote Sensing ProjectsM.Phil Computer Science Remote Sensing Projects
M.Phil Computer Science Remote Sensing ProjectsVijay Karan
 
M phil-computer-science-remote-sensing-projects
M phil-computer-science-remote-sensing-projectsM phil-computer-science-remote-sensing-projects
M phil-computer-science-remote-sensing-projectsVijay Karan
 
Design and Fabrication of a Two Axis Parabolic Solar Dish Collector
Design and Fabrication of a Two Axis Parabolic Solar Dish CollectorDesign and Fabrication of a Two Axis Parabolic Solar Dish Collector
Design and Fabrication of a Two Axis Parabolic Solar Dish CollectorIJERA Editor
 
A Transmission Range Based Clustering Algorithm for Topology Control Manet
A Transmission Range Based Clustering Algorithm for Topology Control ManetA Transmission Range Based Clustering Algorithm for Topology Control Manet
A Transmission Range Based Clustering Algorithm for Topology Control Manetgraphhoc
 
Latest 2016 IEEE Projects | 2016 Final Year Project Titles - 1 Crore Projects
Latest 2016 IEEE Projects | 2016 Final Year Project Titles - 1 Crore ProjectsLatest 2016 IEEE Projects | 2016 Final Year Project Titles - 1 Crore Projects
Latest 2016 IEEE Projects | 2016 Final Year Project Titles - 1 Crore Projects1crore projects
 
Emulation OF 3gpp Scme CHANNEL MODELS USING A Reverberation Chamber MEASUREME...
Emulation OF 3gpp Scme CHANNEL MODELS USING A Reverberation Chamber MEASUREME...Emulation OF 3gpp Scme CHANNEL MODELS USING A Reverberation Chamber MEASUREME...
Emulation OF 3gpp Scme CHANNEL MODELS USING A Reverberation Chamber MEASUREME...IJERA Editor
 
Nonlinear filtering approaches to field mapping by sampling using mobile sensors
Nonlinear filtering approaches to field mapping by sampling using mobile sensorsNonlinear filtering approaches to field mapping by sampling using mobile sensors
Nonlinear filtering approaches to field mapping by sampling using mobile sensorsijassn
 
Energy aware model for sensor network a nature inspired algorithm approach
Energy aware model for sensor network  a nature inspired algorithm approachEnergy aware model for sensor network  a nature inspired algorithm approach
Energy aware model for sensor network a nature inspired algorithm approachijdms
 
IRJET- Performance Analysis of Energy Efficient Clustering Protocol using TAB...
IRJET- Performance Analysis of Energy Efficient Clustering Protocol using TAB...IRJET- Performance Analysis of Energy Efficient Clustering Protocol using TAB...
IRJET- Performance Analysis of Energy Efficient Clustering Protocol using TAB...IRJET Journal
 
ssnow_manuscript_postreview
ssnow_manuscript_postreviewssnow_manuscript_postreview
ssnow_manuscript_postreviewStephen Snow
 
A FAST FAULT TOLERANT PARTITIONING ALGORITHM FOR WIRELESS SENSOR NETWORKS
A FAST FAULT TOLERANT PARTITIONING ALGORITHM FOR WIRELESS SENSOR NETWORKSA FAST FAULT TOLERANT PARTITIONING ALGORITHM FOR WIRELESS SENSOR NETWORKS
A FAST FAULT TOLERANT PARTITIONING ALGORITHM FOR WIRELESS SENSOR NETWORKScsandit
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Performance Evaluation of Consumed Energy-Type-Aware Routing (CETAR) For Wire...
Performance Evaluation of Consumed Energy-Type-Aware Routing (CETAR) For Wire...Performance Evaluation of Consumed Energy-Type-Aware Routing (CETAR) For Wire...
Performance Evaluation of Consumed Energy-Type-Aware Routing (CETAR) For Wire...ijwmn
 

What's hot (19)

M.E Computer Science Remote Sensing Projects
M.E Computer Science Remote Sensing ProjectsM.E Computer Science Remote Sensing Projects
M.E Computer Science Remote Sensing Projects
 
M.Phil Computer Science Remote Sensing Projects
M.Phil Computer Science Remote Sensing ProjectsM.Phil Computer Science Remote Sensing Projects
M.Phil Computer Science Remote Sensing Projects
 
M phil-computer-science-remote-sensing-projects
M phil-computer-science-remote-sensing-projectsM phil-computer-science-remote-sensing-projects
M phil-computer-science-remote-sensing-projects
 
Design and Fabrication of a Two Axis Parabolic Solar Dish Collector
Design and Fabrication of a Two Axis Parabolic Solar Dish CollectorDesign and Fabrication of a Two Axis Parabolic Solar Dish Collector
Design and Fabrication of a Two Axis Parabolic Solar Dish Collector
 
Contel.final
Contel.finalContel.final
Contel.final
 
A Transmission Range Based Clustering Algorithm for Topology Control Manet
A Transmission Range Based Clustering Algorithm for Topology Control ManetA Transmission Range Based Clustering Algorithm for Topology Control Manet
A Transmission Range Based Clustering Algorithm for Topology Control Manet
 
H44093641
H44093641H44093641
H44093641
 
Latest 2016 IEEE Projects | 2016 Final Year Project Titles - 1 Crore Projects
Latest 2016 IEEE Projects | 2016 Final Year Project Titles - 1 Crore ProjectsLatest 2016 IEEE Projects | 2016 Final Year Project Titles - 1 Crore Projects
Latest 2016 IEEE Projects | 2016 Final Year Project Titles - 1 Crore Projects
 
Emulation OF 3gpp Scme CHANNEL MODELS USING A Reverberation Chamber MEASUREME...
Emulation OF 3gpp Scme CHANNEL MODELS USING A Reverberation Chamber MEASUREME...Emulation OF 3gpp Scme CHANNEL MODELS USING A Reverberation Chamber MEASUREME...
Emulation OF 3gpp Scme CHANNEL MODELS USING A Reverberation Chamber MEASUREME...
 
Nonlinear filtering approaches to field mapping by sampling using mobile sensors
Nonlinear filtering approaches to field mapping by sampling using mobile sensorsNonlinear filtering approaches to field mapping by sampling using mobile sensors
Nonlinear filtering approaches to field mapping by sampling using mobile sensors
 
Energy aware model for sensor network a nature inspired algorithm approach
Energy aware model for sensor network  a nature inspired algorithm approachEnergy aware model for sensor network  a nature inspired algorithm approach
Energy aware model for sensor network a nature inspired algorithm approach
 
IRJET- Performance Analysis of Energy Efficient Clustering Protocol using TAB...
IRJET- Performance Analysis of Energy Efficient Clustering Protocol using TAB...IRJET- Performance Analysis of Energy Efficient Clustering Protocol using TAB...
IRJET- Performance Analysis of Energy Efficient Clustering Protocol using TAB...
 
ssnow_manuscript_postreview
ssnow_manuscript_postreviewssnow_manuscript_postreview
ssnow_manuscript_postreview
 
eeca
eecaeeca
eeca
 
A FAST FAULT TOLERANT PARTITIONING ALGORITHM FOR WIRELESS SENSOR NETWORKS
A FAST FAULT TOLERANT PARTITIONING ALGORITHM FOR WIRELESS SENSOR NETWORKSA FAST FAULT TOLERANT PARTITIONING ALGORITHM FOR WIRELESS SENSOR NETWORKS
A FAST FAULT TOLERANT PARTITIONING ALGORITHM FOR WIRELESS SENSOR NETWORKS
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Ieee 2013 matlab abstracts part b
Ieee 2013 matlab abstracts part bIeee 2013 matlab abstracts part b
Ieee 2013 matlab abstracts part b
 
Ahi terahertz 1
Ahi terahertz 1Ahi terahertz 1
Ahi terahertz 1
 
Performance Evaluation of Consumed Energy-Type-Aware Routing (CETAR) For Wire...
Performance Evaluation of Consumed Energy-Type-Aware Routing (CETAR) For Wire...Performance Evaluation of Consumed Energy-Type-Aware Routing (CETAR) For Wire...
Performance Evaluation of Consumed Energy-Type-Aware Routing (CETAR) For Wire...
 

Viewers also liked (16)

Jennifer's Resume
Jennifer's ResumeJennifer's Resume
Jennifer's Resume
 
Trabajo 3 de calculo 2
Trabajo 3 de calculo 2Trabajo 3 de calculo 2
Trabajo 3 de calculo 2
 
Exposi10
Exposi10Exposi10
Exposi10
 
MOHAMMED SIRAJUDDIN - IT
MOHAMMED SIRAJUDDIN - ITMOHAMMED SIRAJUDDIN - IT
MOHAMMED SIRAJUDDIN - IT
 
FYEE 2015 Paper
FYEE 2015 PaperFYEE 2015 Paper
FYEE 2015 Paper
 
Successfully Managing Your Property and Share Portfolio
Successfully Managing Your Property and Share PortfolioSuccessfully Managing Your Property and Share Portfolio
Successfully Managing Your Property and Share Portfolio
 
Wine Town Florence 2013
Wine Town Florence 2013Wine Town Florence 2013
Wine Town Florence 2013
 
nr3809
nr3809nr3809
nr3809
 
Simple funny pranks
Simple funny pranksSimple funny pranks
Simple funny pranks
 
Lkb by ah
Lkb by ahLkb by ah
Lkb by ah
 
Digital marketing course
Digital marketing courseDigital marketing course
Digital marketing course
 
How to Get More Online Readers
How to Get More Online ReadersHow to Get More Online Readers
How to Get More Online Readers
 
20150917交通部:「104年中秋節暨國慶日連續假期疏運計畫」報告
20150917交通部:「104年中秋節暨國慶日連續假期疏運計畫」報告20150917交通部:「104年中秋節暨國慶日連續假期疏運計畫」報告
20150917交通部:「104年中秋節暨國慶日連續假期疏運計畫」報告
 
How to Win Over Gen Y
How to Win Over Gen YHow to Win Over Gen Y
How to Win Over Gen Y
 
#VisitCool Launch Weekend presentation
#VisitCool Launch Weekend presentation#VisitCool Launch Weekend presentation
#VisitCool Launch Weekend presentation
 
ROUTINE
ROUTINEROUTINE
ROUTINE
 

Similar to EAMTA_VLSI Architecture Design for Particle Filtering in

EffectiveOcclusion Handling for Fast Correlation Filter-based Trackers
EffectiveOcclusion Handling for Fast Correlation Filter-based TrackersEffectiveOcclusion Handling for Fast Correlation Filter-based Trackers
EffectiveOcclusion Handling for Fast Correlation Filter-based TrackersEECJOURNAL
 
EMBC'13 Poster Presentation on "A Bio-Inspired Cooperative Algorithm for Dist...
EMBC'13 Poster Presentation on "A Bio-Inspired Cooperative Algorithm for Dist...EMBC'13 Poster Presentation on "A Bio-Inspired Cooperative Algorithm for Dist...
EMBC'13 Poster Presentation on "A Bio-Inspired Cooperative Algorithm for Dist...Md Kafiul Islam
 
Comparative study between metaheuristic algorithms for internet of things wir...
Comparative study between metaheuristic algorithms for internet of things wir...Comparative study between metaheuristic algorithms for internet of things wir...
Comparative study between metaheuristic algorithms for internet of things wir...IJECEIAES
 
Particle Swarm Optimization Based QoS Aware Routing for Wireless Sensor Networks
Particle Swarm Optimization Based QoS Aware Routing for Wireless Sensor NetworksParticle Swarm Optimization Based QoS Aware Routing for Wireless Sensor Networks
Particle Swarm Optimization Based QoS Aware Routing for Wireless Sensor Networksijsrd.com
 
Ieee transactions 2018 topics on wireless communications for final year stude...
Ieee transactions 2018 topics on wireless communications for final year stude...Ieee transactions 2018 topics on wireless communications for final year stude...
Ieee transactions 2018 topics on wireless communications for final year stude...tsysglobalsolutions
 
Congestion Control in Manets Using Hybrid Routing Protocol
Congestion Control in Manets Using Hybrid Routing ProtocolCongestion Control in Manets Using Hybrid Routing Protocol
Congestion Control in Manets Using Hybrid Routing ProtocolIOSR Journals
 
Congestion Control in Manets Using Hybrid Routing Protocol
Congestion Control in Manets Using Hybrid Routing ProtocolCongestion Control in Manets Using Hybrid Routing Protocol
Congestion Control in Manets Using Hybrid Routing ProtocolIOSR Journals
 
Data mining projects topics for java and dot net
Data mining projects topics for java and dot netData mining projects topics for java and dot net
Data mining projects topics for java and dot netredpel dot com
 
A Robust Topology Control Solution For The Sink Placement Problem In WSNs
A Robust Topology Control Solution For The Sink Placement Problem In WSNsA Robust Topology Control Solution For The Sink Placement Problem In WSNs
A Robust Topology Control Solution For The Sink Placement Problem In WSNsJim Webb
 
Nearest Adjacent Node Discovery Scheme for Routing Protocol in Wireless Senso...
Nearest Adjacent Node Discovery Scheme for Routing Protocol in Wireless Senso...Nearest Adjacent Node Discovery Scheme for Routing Protocol in Wireless Senso...
Nearest Adjacent Node Discovery Scheme for Routing Protocol in Wireless Senso...IOSR Journals
 
Shortest path algorithm for data transmission in wireless ad hoc sensor networks
Shortest path algorithm for data transmission in wireless ad hoc sensor networksShortest path algorithm for data transmission in wireless ad hoc sensor networks
Shortest path algorithm for data transmission in wireless ad hoc sensor networksijasuc
 
Ieee transactions 2018 on wireless communications Title and Abstract
Ieee transactions 2018 on wireless communications Title and AbstractIeee transactions 2018 on wireless communications Title and Abstract
Ieee transactions 2018 on wireless communications Title and Abstracttsysglobalsolutions
 
IRJET- Sink Mobility based Energy Efficient Routing Protocol for Wireless Sen...
IRJET- Sink Mobility based Energy Efficient Routing Protocol for Wireless Sen...IRJET- Sink Mobility based Energy Efficient Routing Protocol for Wireless Sen...
IRJET- Sink Mobility based Energy Efficient Routing Protocol for Wireless Sen...IRJET Journal
 
FMADM SYSTEM FOR MANET ENVIRONMENT
FMADM SYSTEM FOR MANET ENVIRONMENTFMADM SYSTEM FOR MANET ENVIRONMENT
FMADM SYSTEM FOR MANET ENVIRONMENTpijans
 
FMADM SYSTEM FOR MANET ENVIRONMENT
FMADM SYSTEM FOR MANET ENVIRONMENTFMADM SYSTEM FOR MANET ENVIRONMENT
FMADM SYSTEM FOR MANET ENVIRONMENTpijans
 
Remote Sensing IEEE 2015 Projects
Remote Sensing IEEE 2015 ProjectsRemote Sensing IEEE 2015 Projects
Remote Sensing IEEE 2015 ProjectsVijay Karan
 

Similar to EAMTA_VLSI Architecture Design for Particle Filtering in (20)

EffectiveOcclusion Handling for Fast Correlation Filter-based Trackers
EffectiveOcclusion Handling for Fast Correlation Filter-based TrackersEffectiveOcclusion Handling for Fast Correlation Filter-based Trackers
EffectiveOcclusion Handling for Fast Correlation Filter-based Trackers
 
EMBC'13 Poster Presentation on "A Bio-Inspired Cooperative Algorithm for Dist...
EMBC'13 Poster Presentation on "A Bio-Inspired Cooperative Algorithm for Dist...EMBC'13 Poster Presentation on "A Bio-Inspired Cooperative Algorithm for Dist...
EMBC'13 Poster Presentation on "A Bio-Inspired Cooperative Algorithm for Dist...
 
Comparative study between metaheuristic algorithms for internet of things wir...
Comparative study between metaheuristic algorithms for internet of things wir...Comparative study between metaheuristic algorithms for internet of things wir...
Comparative study between metaheuristic algorithms for internet of things wir...
 
Particle Swarm Optimization Based QoS Aware Routing for Wireless Sensor Networks
Particle Swarm Optimization Based QoS Aware Routing for Wireless Sensor NetworksParticle Swarm Optimization Based QoS Aware Routing for Wireless Sensor Networks
Particle Swarm Optimization Based QoS Aware Routing for Wireless Sensor Networks
 
Ieee transactions 2018 topics on wireless communications for final year stude...
Ieee transactions 2018 topics on wireless communications for final year stude...Ieee transactions 2018 topics on wireless communications for final year stude...
Ieee transactions 2018 topics on wireless communications for final year stude...
 
Congestion Control in Manets Using Hybrid Routing Protocol
Congestion Control in Manets Using Hybrid Routing ProtocolCongestion Control in Manets Using Hybrid Routing Protocol
Congestion Control in Manets Using Hybrid Routing Protocol
 
Congestion Control in Manets Using Hybrid Routing Protocol
Congestion Control in Manets Using Hybrid Routing ProtocolCongestion Control in Manets Using Hybrid Routing Protocol
Congestion Control in Manets Using Hybrid Routing Protocol
 
Data mining projects topics for java and dot net
Data mining projects topics for java and dot netData mining projects topics for java and dot net
Data mining projects topics for java and dot net
 
A Robust Topology Control Solution For The Sink Placement Problem In WSNs
A Robust Topology Control Solution For The Sink Placement Problem In WSNsA Robust Topology Control Solution For The Sink Placement Problem In WSNs
A Robust Topology Control Solution For The Sink Placement Problem In WSNs
 
Nearest Adjacent Node Discovery Scheme for Routing Protocol in Wireless Senso...
Nearest Adjacent Node Discovery Scheme for Routing Protocol in Wireless Senso...Nearest Adjacent Node Discovery Scheme for Routing Protocol in Wireless Senso...
Nearest Adjacent Node Discovery Scheme for Routing Protocol in Wireless Senso...
 
cec2013
cec2013cec2013
cec2013
 
Shortest path algorithm for data transmission in wireless ad hoc sensor networks
Shortest path algorithm for data transmission in wireless ad hoc sensor networksShortest path algorithm for data transmission in wireless ad hoc sensor networks
Shortest path algorithm for data transmission in wireless ad hoc sensor networks
 
Ieee transactions 2018 on wireless communications Title and Abstract
Ieee transactions 2018 on wireless communications Title and AbstractIeee transactions 2018 on wireless communications Title and Abstract
Ieee transactions 2018 on wireless communications Title and Abstract
 
IRJET- Sink Mobility based Energy Efficient Routing Protocol for Wireless Sen...
IRJET- Sink Mobility based Energy Efficient Routing Protocol for Wireless Sen...IRJET- Sink Mobility based Energy Efficient Routing Protocol for Wireless Sen...
IRJET- Sink Mobility based Energy Efficient Routing Protocol for Wireless Sen...
 
Matlab 2013 14 papers astract
Matlab 2013 14 papers astractMatlab 2013 14 papers astract
Matlab 2013 14 papers astract
 
FMADM SYSTEM FOR MANET ENVIRONMENT
FMADM SYSTEM FOR MANET ENVIRONMENTFMADM SYSTEM FOR MANET ENVIRONMENT
FMADM SYSTEM FOR MANET ENVIRONMENT
 
FMADM SYSTEM FOR MANET ENVIRONMENT
FMADM SYSTEM FOR MANET ENVIRONMENTFMADM SYSTEM FOR MANET ENVIRONMENT
FMADM SYSTEM FOR MANET ENVIRONMENT
 
Nonlinear Controller for the Laser Fiber Using PID Controller
Nonlinear Controller for the Laser Fiber Using PID ControllerNonlinear Controller for the Laser Fiber Using PID Controller
Nonlinear Controller for the Laser Fiber Using PID Controller
 
L1102017479
L1102017479L1102017479
L1102017479
 
Remote Sensing IEEE 2015 Projects
Remote Sensing IEEE 2015 ProjectsRemote Sensing IEEE 2015 Projects
Remote Sensing IEEE 2015 Projects
 

EAMTA_VLSI Architecture Design for Particle Filtering in

  • 1. VLSI Architecture Design for Particle Filtering in Real-time A. Pasciaroni∗†, J. A. Rodr´ıguez†, F. Masson∗†, P. Juli´an∗†, E. Nebot‡ ∗Dep. Ing. El´ectrica y Computadoras, Universidad Nacional del Sur Av. Alem 1253, Bah´ıa Blanca, Argentina †CONICET, Argentina ‡Australian Centre for Field Robotics, University of Sydney, Australia Abstract—Particle Filter is an algorithm that provides system state estimation even for non-linear and non-gaussian systems. For applications that require a large number of particles, real time constraint is hard to accomplish since the algorithm is computationally expensive and the resampling step becomes a bottleneck. In this work, a VLSI architecture for particle filtering in real time is presented. The proposed design implements a fraction of the processing using piecewise linear functions and allocates them as global resources. In this way, a large number of processing elements (PE) working in parallel can be instantiated in the design. An example based on a range-only localization using Radio-Frecuency identification (RFID) tags is developed to illustrate the approach. The received signal strength indicator (RSSI) is used to estimate the distance between transmitter and receiver. A VHDL RTL model of the processing data flow is implemented and compared to Matlab simulations showing similar results. Index Terms—Particle Filter, VLSI Design, RFID, RTL. I. INTRODUCTION Particle Filters (PF) [1] are a method to perform statistical dynamic state estimation. The probability density function of a given state is represented by a set of weighted entities or particles which is updated iteratively according to sensor mea- surements and a dynamic system model. The three main steps of the particle filter are: sampling, update and resampling. This last step presents high data dependency between particles, becoming the major bottleneck in the execution time of the filter. There exist applications that require real-time estimation of non-linear and non-gaussian systems as robot localization and visual tracking [2], [3], [4]. These applications are well suited for particle filtering but a large number of particles is required to provide accurate estimations. Since the PF algorithm is computationally expensive and the resampling step cannot be fully parallelized, particle filter computation in real time is limited by the available computational resources. In this context, a VLSI implementation that exploits algorithm data level parallelism will allow particle filtering at real time. Previous works have addressed particle filter implementa- tions for real time applications [5] [6] [7] [8]. In [5] a PF architecture composed of multiple processing elements and a central unit for the bearing-only tracking problem is presented and implemented in FPGA. Particle filter steps are performed locally on each processing element (PE). After resampling, a central unit controls the particle exchange among processors in order to reduce performance degradation. Several commu- nication schemes are introduced including a fixed particle exchange among processors. In [7] a VLSI design of the processing element is presented which also includes a pipeline dataflow that deals with logic blocks of variable latency. In [8] a central unit that performs communication schemes, intro- duced in [5], for an architecture composed of four processing elements is designed and a VLSI implementation is presented. In [6] a parallel pipelined design is presented. The number of replicated pipeline stages is variable. Taking into account the rate of each stage an optimal number of replicated stages is determined. However, a VLSI implementation that takes full advantage of the data level parallelism present in the algorithm, has not been developed yet. In this work a VLSI architecture for particle filtering in real time applications is presented. It is composed of processing clusters with one resampling module and an array of PE. Each PE performs several steps of the PF operation that do not present data dependency, in a pipelined fashion. Therefore, if more PE can be instantiated in a given Silicon area, more particles can be effectively processed in parallel, increasing the throughput. Afterwards, resampling modules gather PE outputs so that the resampling is performed in groups. In addition, to reduce the PE area, a fraction of the PE data processing is time-multiplexed so hardware dedicated to this processing is instantiated once and can be shared by multiple PE. The application chosen to illustrate the approach is target tracking based on Received Signal Strength Indicator (RSSI) of Radio Frequency Identification devices (RFID). The paper is organized as follows. Section II presents the localization framework and RSSI sensor model. The archi- tecture and microarchitecture design is presented in section III. Execution time of proposed architecture is presented in section IV. Simulation results comparing the VHDL RTL and Matlab models are presented in section V. Finally, Section VI is dedicated to the conclusions. II. LOCALIZATION FRAMEWORK In sensor networks, Radio Frequency based localization systems have gained importance in those environments where Global Positioning based system (GPS) do not perform well due to poor satellite availability or multiple path issues [9] [10]. This a possible situation for the choosen target appli- cation: trucks localization in opencast mining enviroments
  • 2. 0 10 20 30 40 50 60 −150 −100 −50 0 50 Two Ray Model Distance [m] AverageSignalStrength[dBm] Fig. 1: Two Ray Model for a communication link of 433 Mhz in a rural enviroment. [9]. The RFID technology comprises the receivers, antennas and RFID tags. The tags send their identification number to the receivers. Making use of RSSI it is possible to estimate the distance between a tag and a receiver since RSSI values decrease with distance with a known law. Due to several factors that affect propagation of electromagnetic waves in a medium (refractions, reflections, scattering), the received power vs distance relation varies with the obstacles in the environment, the height and direction of the antenna and also the power of the signal transmitted. This results in a non- biyective and thus multimodal sensor function. Figure 1 shows a typical two-ray propagation model of RF signals [11] for a rural environment and a communication frequency of 433 MHz and transmitter and receiver height of 2.5 m. It shows the average signal strength of the received power versus distance. For a given distance the distribution of RF signal is considered Gaussian and its variance varies with the signal strength [9]. It is possible to observe that for a received power of −70 dBm there exist multiple distance values: 8 m, 15.5 m, 20 m and 43.1 m being one of those the true value of the tag position. This example shows the multi- modal probabilistic density function associated with RFID sensor. RSSI based localization can be performed using the particle filter algorithm. Consider a hypothetical scenario of one RFID tag moving in 2-D and one antenna located at the origin. Let pi k denote the ith particle, where pi k = x ˙x y ˙y ′ . The target system evolution is given by f(pi k−1, vx, vy) =     1 ∆T 0 0 0 1 0 0 0 0 1 ∆T 0 0 0 1     · pi k−1+     0.5 · ∆T 2 0 ∆T 0 0 0.5 · ∆T 2 0 ∆T     · vx vy , (1) where vx and vy are drawn from a uniform distribution U[0, Q]. The pseudocode of the Particle Filter algorithm for the chosen application and for a set of N particles is described below: random initialization of particles; for i ← 1 to N do pi k = f(pi k−1, vx, vy); //sampling di = sqrt(pi k(1) 2 + pi k(3) 2 ); Poti = Fsensor(di ); wi = 1√ 2π·σ2 · exp(−(P oti −P otmeasurement)2 2·σ2 ) //update end [ ˆw, ˆp] = resampling(w, pk); where Potmeasurement is the power measurement of the received signal whose variance is σ2 and Fsensor(d) is the mathematical expression of the Two ray propagation model whose characteristic is shown in Figure 1. Depending on the obstacles present in the enviroment a more complex sensor model can be utilized. For the resampling step there exist several algorithms [12], [5], [13]. Position estimation is computed by the following equation: ˜x = N i=1 ˆpi · ˆwi (2) III. DESIGN A. Architecture The PF filter algorithm does not presents data dependence between particles except for the resampling step. When the number of particles increases the resampling execution time can become a bottleneck. A strategy to reduce the resampling execution time is to divide the total number of particles into groups so parallelism level is increased [5]. Each particle group is processed by a dedicated processor. Since the resam- pling step is sequentially executed, there exists a trade off be- tween the number of processors and the estimation error: as the number of particle groups increases, so does the degradation of the filter [14]. In order to reduce this performance degradation a particle exchange must be performed among processors. In [15] an optimization of the particles exchange procedure is presented. A formal analysis, applying the Kullback-Leibler divergence, proves that the exchange of particles with largest weights between adjacent processors results in better accuracy than a random particle mixing. In [14] this exchange is performed after resampling thus the selection of particles with largest weight is avoided. The analysis of algorithm parallelization has been done in [14] allowing the selection of an optimal configuration. Once one filter iteration has been performed, the estimate of each processor is combined in order to provide a global estimation [15]. The system consists of two modules: the measurement unit and the processing unit. The system block diagram is shown in Fig. 2-a. The measurement unit sets up the RSSI value and computes the reciprocal of the noise variance. The processing unit performs the PF algorithm and provides an estimated position. In order to process thousands of particles in real-time the processing unit architecture must exploit data level parallelism and at the same time take into account the strategy described above. A parallelism level hierarchy is adopted. The first
  • 3. level is performed by introducing multiple processing elements (PEs) each one performing the PF algorithm steps that do not present data dependency. The second level consists in gath- ering PEs in clusters so data input for the resampling step is made up of the processed particle and weight of each PE inside a cluster. For the final estimation of position, the estimate of each cluster is combined as was previously mentioned. Particle exchange among clusters is also performed. The proposed VLSI design implements most area consum- ing operations in external (out of the array) Look-up tables (LUT). These LUTs are taken away from the processing element dataflow and put them as global resources. For each table there is a Broadcast module that sequentially reads the table and performs interpolation. The interpolated value and interpolation address are broadcasted to all PEs through buses. Each PE locally computes its required interpolation address and compares it with the current value in the bus. If an equivalence is found, the corresponding data value is acquired by the PE. Figure 2-b shows a more detailed architecture of the pro- cessing unit. It has 4 clusters with 4 PEs each. Sensor measure- ment and the reciprocal of its variance 1/σ2 are communicated to all PEs. Four global resources are introduced: Square, Sqrt, Sensor and Normal LUT. Each broadcast module has two independent buses: interpolation/address and data/bus. Resampling, pseudo-random number generator (PRNG) and Word-to-memory modules inside each cluster are also in- troduced. All modules are explained in further subsections. Communication among clusters is not shown to simplify the diagram. Each cluster has its own local memory and works without data dependence of others except when the particle exchange is performed. Processing elements belonging to a cluster share local memory. Regarding control logic, each cluster has its own control logic that manages main memory reading and writing and also global control signals. Furthermore, each processing element that integrates a cluster has a dataflow pipeline whose control is distributed. Since each pipeline stage has a variable delay dependent on the time instant when the corresponding value is present in data bus, global pipeline control is not affordable. Therefore, each stage has a local control logic dependent on data events. B. Cluster Operation Architecture cluster operation proceeds as follows: while in execution, each PE inside a cluster reads a particle from memory. Each Broadcast module sequentially reads its corre- sponding LUT, interpolates and broadcasts interpolated value and interpolation address to all PEs. Since the PE dataflow is pipelined, a single table read is utilised to process several particles. Main memory has two ports so memory reading and writing is performed simultaneously. Two arrays, one made up of particles and another one of processed weights from each PE are the input for the Resampling module. Once the arrays have been totally up- dated, resampling is performed. The elements of resampling Fig. 2: a) Block Diagram of the VLSI architecture for proposed tracking system, b) architecture of the processing unit. arrays are processed sequentially. As soon as one element is resampled, it is immediately updated by corresponding PE. Once all data from local memory has been processed communication among clusters is performed. C. LUTs design The functions implemented in the LUTs are: square, square root, two ray propagation model (as shown in Fig. 1) and normal distribution. All of them are evaluated with a piecewise linear function with uniform segmentation. By performing interpolation, a reduction in table size is achieved. At a point x ∈ [a , b], a linear interpolation is calculated as follows: ˜f(x) = f(b) − f(a) b − a · (x − a) + f(a) (3) This operation is performed by the broadcast module shown in Fig. 3. A counter generates 2N+M words where the N most significant bits are used for LUT addressing and the remaining M bits for interpolation. The introduced dataflow is composed of several tabulated functions and interpolations in cascade. When the interpolated value from a broadcast module is captured by the correspond- ing pipeline stage, it becomes the interpolation address for the next tabulated function. It is desirable to find an appropriate word length for LUT addressing, interpolation and function value quantization. This length should maximise the ratio between interpolation address word length and interpolated value word length. At the same time, the approximation errors
  • 4. Fig. 3: Broadcast Module TABLE I: Piecewise Linear Function Setup F unctionN M Q R S Size Kbits Range X Interp. Error Square 9 2 14 17 - 7 [0,40] 5 · 10−4 Sqrt 10 2 11 13 5 11 [0,3200] 3 · 10−4 Sensor 10 2 10 12 1 10 [0,113] 4 · 10−4 Normal 9 1 10 11 3 5 [0,5] 5 · 10−3 should be reduced since they are propagated through the dataflow. In this regard, the accuracy analysis introduced in [16] for practical implementation of piecewise linear functions is adopted. Table I shows the setup chosen for each piecewise linear function implementation where N, M, Q are the number of bits assigned for segmentation, interpolation and function value quantization. R and S are output data resolution and discarded input bits. The error introduced by each interpola- tion, which is calculated as the median of the absolute error over one thousand samples of evaluation interval, i.e., error(x) = mean(| f(x) − finterp(x)) f(x) |) (4) is also included in the table. The normal distribution implementation requires to evaluate normal distributions with different values of variance. Any normal distribution can be obtained from the standard normal distribution. If a distribution with mean µ and variance σ2 must be evaluated for a value t, the following equations allow the calculation using only the standard normal distribution function: z = t − µ σ , (5) pNormal = 1 σ · pStandardNormal(z), (6) where PStandardNormal(z) = 1 √ 2 · π · exp( −z2 2 ). (7) Moreover, as the function is symmetric around the mean, there is only need to store half of the evaluation interval, reducing even more the LUT size. The architecture comprises dual port memories thus the two values for interpolation can be obtained simultaneously. Fig. 4: PE Micro-architecture D. PE Micro Architecture Each PE sequentially performs the two algorithm steps that do not present data dependency: sampling and update. Pro- cessing is divided into several modules in order to implement a level module pipeline: Sampling, Acquisition Square Value, Acquisition Sqrt Value, Acquisition Sensor Value and Acqui- sition Normal Value. Figure 4 shows the pipelined dataflow microarchitecture. 1) Sampling Unit: The sampling unit processes data from main memory current location. Memory word datawidth is 48 bits where each particle component has 12 bits. Range for position and velocity is [−40, 40] m and [−25, 25] m/s. This unit performs a translation in the plane by using a simplified version of the dynamic model detailed in (1). This simplifica- tion allows a reduction in the number of multiplications. For this design the dynamic model is fixed but future designs will consider a programable model. The translated positions and velocities are computed as follows px(k) = px(k − 1) + vx(k − 1) · △T + 1 2 · nx (8) py(k) = py(k − 1) + vy(k − 1) · △T + 1 2 · ny (9) vx(k) = vx(k − 1) + nx (10) vy(k) = vy(k − 1) + ny (11) where nx and ny are drawn from a uniform distribution U[0, W]. Depending on the value of the △T parameter, the W value should be adjusted in order to provide similar accelerations than the original model. The random noise is generated by a linear feedback shift register [LFSR] [17] with internal XORs of 16 bits with reconfigurable seed. This pseudo random number generator is a shared resource inside a cluster. Each PE takes a number at its corresponding turn. The eight most significant bits are used fot the nx component and the eight less significant bits for the ny component. Each component noise is pre-multiplied by the variance value Q. Either Q and △T registers are programmables of 8 bits length. The output of the sampling unit has the same datawidht as its input. 2) Acquisition Value Units: All acquisition units detect when data input is equal to the current value in the interpola- tion address bus. This detection is performed with a bitwise xor operation. When an equivalence is detected the data present in the data bus is acquired.
  • 5. The Acquisition Square Value unit, performs the sum of the inputs squared. When x or y are negative two’s complement is performed. Thus |x| and |y| have 11-bit word length and are compared to the interpolation address bus. Once the squared value is captured for both components, sum is performed with 17 bits output data width. The broadcast module for the Sqrt function provides a 12-bit interpolation address bus. Therefore the 5 less significant bits of x2 + y2 are discarded when Acquisition Sqrt Value unit compares its data input with the value present in the interpolation address bus. The same occurs for the block Acquisition Sensor Value with the less significant bit discarded from its input word. The Acquisition Normal Value Unit generates a word using (5) with µ equal to Potmeasurement. Once an equivalence is detected, the data present in the bus is acquires. Finally it is multiplied by the reciprocal of the standard deviation as stated in (6). The reciprocal of the variance has 8-bit width as well as the power measurement. In order to perform subtraction in (5), the 5 less significant bits of the input word are discarded. The word length after this equation is 16 bits. According to Table I, the tabulated normal function requires 1 interpolation bit, therefore the 6 less significant bits are not taken into account resulting in a 10-bit word length. Once data value is captured by the PE, it is multiplied by the reciprocal of the variance, resulting in a 19-bit word lentgh. E. Resampling unit The resampling algorithm selected for implementation is the modified Independent Metropolis Hasting (IMH) [12] which substitutes division operation for comparison and particles and their weights are sequentially processed. The algorithm is summarized in the following pseudocode: wprev = wk 1 ; for i ← 2 to NUMPART ICLES do u ∼ U(0, 1); if ( u · wprev > wk i ) then wprev = wprev; resample = 1; else wprev = wk i ; resample = 0; end end Algorithm 1: Implemented resampling algorithm Figure 5 shows the architecture of resampling and word-to- memory modules. The particle array is fulfilled with output particles from sampling unit. Both arrays particle array and weight array must be fully updated in order to initiate resampling operation. First particle of the whole set is always resampled. Subsequent particles will be stored in memory depending on the comparison among their weight and wprev. The random number generator is implemented with a LFSR of 16 bits. The resample signal controls the data stored in memory. If value of resample is 1, the data present in the particle register is written to memory else actual processed particle is selected and wprev is updated. In order to sinchronize the translated particle with the pipeline time schedule, it must be delayed as many times as the Fig. 5: Word-To-memory Architecture number of pipeline stages in between the sampling unit and the Acquisition normal value unit. Each PE reads a particle from a memory location and, once the particle is resampled, word- to-memory unit stores it at the same location. Since a dual port memory is considered and the architecture is pipelined, memory reading and writing is done simultaneously. Control is achieved with a read address and write adress counter. The former is dependent on control signals from sampling unit and the latter is dependent on control signals from word-to- memory unit. IV. EXECUTION TIME Since the execution time of each module is variable, each PE will complete its processing at different times. The resam- pling module begins operation when the first PE has finished processing its particle. Figure 6 shows the execution time of the dataflow for a cluster made up of two processing elements. Pipeline delay between output data values is given by the slowest stage. In the presented design this corresponds to the stage with the largest interpolation bus address, since it takes 2N+M cycles in order to acquire the last interpolated value. This is the case of the Sqrt function. In the worst case execution time, a new particle is processed every 4096 cycles. As resampling takes one cycle to process each particle, the number of cycles to finish the resampling operation depends on the number of PEs in a cluster. Therefore, the last element of the resampling array will be updated every 2N+M + P, where P is the number of PEs in the cluster. V. RESULTS A. Simulation Results A VHDL RTL model of the processing element was de- veloped. The implementation flow was the following: first a fixed point Matlab implementation of the processing element described above was generated and compared to its floating point counterpart to prove its proper operation. Second, an RTL model that matches the fixed point Matlab implemen- tation was developed. At this stage of the implementation,
  • 6. Fig. 6: Filter execution time. 0 2 4 6 8 10 12 14 16 18 20 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 Distance [m] Weight Matlab Model vs RTL Model Matlab Model RTL Model Fig. 7: Weights vs. distance. the RSSI measurement, the reciprocal of the noise standard deviation and the position estimation are generated off-line. Figure 7 shows the distribution of weights vs distance for the floating point Matlab implementation and the RTL model, in the case where the measured power is −55.42 dBm and σ = 2.5 dBm. Since RSSI measurements are 8-bit quantized, the normal distribution is also quantized. It can be noticed that the RTL model provides similar results to the floating point Matlab implementation. A 2-D tracking scenario can be simulated to show the dynamic performance. In this case, the fixed point Matlab implementation is used, instead of the RTL model, in order to reduce simulation time. The scenario is composed of a unit moving at nearly constant velocity and three fixed antennas s1, s2 and s3 placed at positions: [0, 0], [−20, 0], [0, 20], respectively. The position of the target unit evolves with time according to (1). The mobile initial state is x0 = [−8m, 12m/s, 10m, −2m/s] and ∆T = 0.1s. The total number of particles used is 4096, which are uniformly distributed on a region delimited by the intervals [−20m, 20m] and [0, π] radians at the beginning of the simulation. Particle velocities have been randomly initialised with uniform distribution in the interval [17, 7] for ˙x and [7, −3] for ˙y. Figure 8 shows the trajectory of the target unit (green line) and simulation results for the Matlab model and the RTL model, in red and black lines, respectively. Both models provide very close results. −8 −6 −4 −2 0 2 4 6 8 0 5 10 15 20 Floating Point Matlab Model Fixed Point Matlab Model Tartet trajectory Antenna Fig. 8: Tracking of a moving target with three antennas. TABLE II: Synthesis Results Module Area [µm2] Sampling 37453 Acq. Square Value 6144 Acq. Sqrt Value 2268 Acq. Sensor Value 1932 Acq. Normal Value 13293 Total Area EP 87086 B. Synthesis Results The RTL model of the processing element described in section III was synthesized using Synopsis DC Compiler and 0.13µm CMOS technology. Since the array is composed of several processing elements it is desirable to have the area required for this basic unit. Table II shows the area of the processing element and its modules. VI. CONCLUSIONS A VLSI architecture for particle filtering in real time was presented. This architecture exploits the data level parallelism in the algorithm and also takes into account performance degradation due to resampling parallelization. Introducing global resources allows an increase in concurrent hardware. Processing dataflow was described along with a piecewise lin- ear function implementation. An RTL model of the proposed design was generated. Simulation shows that the architecture correctly implements the PF adapted to the specific applica- tion. Further work is needed to choose an optimal number of PEs per cluster. VII. ACKNOWLEDGMENTS The results of this paper were partially supported by PICT 2010-2657 3D Gigascale Integrated Circuits for Nonlinear Computation, Filter and Fusion with Applications in Industrial Field Robotics of Agencia Nacional de Promoci´on Cient´ıfica y Tecnol´ogica (ANPCyT) of the Argentine Ministry of Science and Technology (MINCYT). REFERENCES [1] N. Gordon, D. Salmond, and A. F. M. Smith, “Novel approach to nonlinear/non-gaussian bayesian state estimation,” IEE Proc. Of Radar and Signal Processing, vol. 140, no. 2, pp. 107–113, 1993. [2] M. Isard and A. Blake, “Condensation - conditional density propagation for visual tracking,” International Journal of Computer Vision, vol. 29, no. 1, pp. 5–28, 1998.
  • 7. [3] D. Fox, “Kld-sampling: Adaptive particle filters and mobile robot localization,” in Advances in Neural Information Processing Systems 14, vol. 2, 2001, pp. 713–720. [4] D. F. C Kwok and M. Meila, “Real-time particle filters,” Proceedings of the IEEE, vol. 92, no. 3, pp. 469–484, Mar 2004. [5] M. Bolic, P. M. Djuric, and S. Hong, “Resampling algorithms and architectures for distributed particle filters,” IEEE Transactions on Signal Processing, vol. 53, no. 7, pp. 2442–2450, July 2005. [6] A. C. Sankaranarayanan, A. Srivastava, and R. Chellappa, “Algorithmic and architectural optimizations for computationally efficient particle filtering,” IEEE transcactions on Image Processing, vol. 17, no. 5, pp. 737–748, May 2008. [7] S.-S. Chin and S. Hong, “Vlsi design of high-throughput processing element for real-time particle filtering,” in Signals, Circuits and Systems, vol. 2, 2003, pp. 617–620. [8] S. Hong, S. S. Chin, M. Boli, and P. M. Djuric, “Design and implemen- tation of flexible resampling mechanism for high-speed parallel particle filters,” Journal of VLSI signal processing systems for signal, image and video technology, vol. 44, pp. 47–62, 2006. [9] G. Kloos, J. E. Guivant, E. M. Nebot, and F. Masson, “Range based localisation using rf and the application to mining safety,” in Proceedings of the 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, Oct 2006, pp. 1304–1311. [10] S. Sanudo and F. R. Masson, “Desempe˜no del filtro de part´ıculas acotado en una aplicaci´on de localizaci´on y seguimiento de camiones en una explotaci´on minera,” in XIV Reunion de Trabajo en Procesamiento de la Informacion y Control, vol. 1, 2011, pp. 712–717. [11] H. Xia, H. L. Bertoni, L. Maciel, A. Lindsay-Stewart, and R. Rowe, “Radio propagation characteristics for line-of-sight microcellular and personal communications,” IEEE Transactions on Antennas and Propa- gation, vol. 41, no. 10, pp. 1439–1447, Oct 1993. [12] L. Miao, J. J. Zhang, C. Chakrabarti, and A. Papandreou-Suppappola, “Algorithm and parallel implementation of particle filtering and its use in waveform-agile sensing.” Signal Processing Systems, vol. 65, no. 2, pp. 211–227. [13] M. Bolic, P. M. Djuric, and S. Hong, “Resampling algorithms for particle filters: A computational complexity perspective,” EURASIP Journal on Applied Signal Processing, vol. 15, pp. 2267–2277, 2004. [14] A. Pasciaroni, S. Sanudo, J. Rodriguez, F. Masson, and P. Julian, “Modelling and analysis of parallel particle filters,” in XV Reunion de Trabajo en Procesamiento de la Informacion y Control, vol. 1, no. 1, 2013, pp. 1–6. [15] B. Balasingam, M. Bolic, P. Djuric, and J. Miguez, “Efficient distributed resampling for particle filters,” in IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), 2011, pp. 3772–3775. [16] O. Lischitz, P. Julian, J. Rodriguez, and O. Agamennoni, “Accuracy analysis for an on-chip digital pwl realization,” in XIV Reunion de Trabajo en Procesamiento de la Informacion y Control, 2011, pp. 429– 434. [17] Z. Barzilai, D. Coppersmith, and A. L. Rosenberg, “Exhaustive gen- eration of bit patterns with applications to vlsi self-testing,” IEEE Transactions on Computers, vol. C-32, no. 2, pp. 190–194, Feb 1983.